deeptab.models

class deeptab.models.MambularClassifier(*args: Any, **kwargs: Any)[source]

Mambular classifier. This class extends the SklearnBaseClassifier class and uses the Mambular model with the default Mambular configuration.

Notes

The parameters for this class include the attributes from the config dataclass as well as preprocessing arguments handled by the base class.

Configuration class for the Default Mambular model with predefined hyperparameters.

Parameters:
  • d_model (int, default=64) – Dimensionality of the model.

  • n_layers (int, default=4) – Number of layers in the model.

  • expand_factor (int, default=2) – Expansion factor for the feed-forward layers.

  • bias (bool, default=False) – Whether to use bias in the linear layers.

  • dropout (float, default=0.0) – Dropout rate for regularization.

  • d_conv (int, default=4) – Size of convolution over columns.

  • dilation (int, default=1) – Dilation factor for the convolution.

  • dt_rank (str, default="auto") – Rank of the decision tree used in the model.

  • d_state (int, default=128) – Dimensionality of the state in recurrent layers.

  • dt_scale (float, default=1.0) – Scaling factor for decision tree parameters.

  • dt_init (str, default="random") – Initialization method for decision tree parameters.

  • dt_max (float, default=0.1) – Maximum value for decision tree initialization.

  • dt_min (float, default=1e-04) – Minimum value for decision tree initialization.

  • dt_init_floor (float, default=1e-04) – Floor value for decision tree initialization.

  • norm (str, default="RMSNorm") – Type of normalization used (‘RMSNorm’, etc.).

  • activation (callable, default=nn.SiLU()) – Activation function for the model.

  • shuffle_embeddings (bool, default=False) – Whether to shuffle embeddings before being passed to Mamba layers.

  • head_layer_sizes (list, default=()) – Sizes of the layers in the model’s head.

  • head_dropout (float, default=0.5) – Dropout rate for the head layers.

  • head_skip_layers (bool, default=False) – Whether to skip layers in the head.

  • head_activation (callable, default=nn.SELU()) – Activation function for the head layers.

  • head_use_batch_norm (bool, default=False) – Whether to use batch normalization in the head layers.

  • pooling_method (str, default="avg") – Pooling method to use (‘avg’, ‘max’, etc.).

  • bidirectional (bool, default=False) – Whether to process data bidirectionally.

  • use_learnable_interaction (bool, default=False) – Whether to use learnable feature interactions before passing through Mamba blocks.

  • use_cls (bool, default=False) – Whether to append a CLS token to the input sequences.

  • use_pscan (bool, default=False) – Whether to use PSCAN for the state-space model.

  • mamba_version (str, default="mamba-torch") – Version of the Mamba model to use (‘mamba-torch’, ‘mamba1’, ‘mamba2’).

  • conv_bias (bool, default=False) – Whether to use a bias in the 1D convolution before each mamba block

  • AD_weight_decay (bool = True) – Whether to use weight decay als for the A and D matrices in Mamba

  • BC_layer_norm (bool = False) – Whether to use layer norm on the B and C matrices

  • feature_preprocessing (dict, optional) – Dictionary mapping feature names to specific preprocessing methods. Overrides global defaults.

  • n_bins (int, default=64) – Number of bins used for binning-based preprocessing (e.g., for discretizers or PLE).

  • numerical_preprocessing (str, default="ple") – Preprocessing method for numerical features (e.g., “standardization”, “minmax”, “ple”, “rbf”, etc.).

  • categorical_preprocessing (str, default="int") – Preprocessing method for categorical features (e.g., “int”, “ordinal”, “onehot”).

  • use_decision_tree_bins (bool, default=False) – Whether to use decision tree binning for numerical discretization.

  • binning_strategy (str, default="uniform") – Strategy for bin placement when not using tree-based methods. Options: “uniform”, “quantile”.

  • task (str, default="regression") – Problem type used to guide preprocessing (e.g., “regression” or “classification”).

  • cat_cutoff (float or int, default=0.03) – Threshold to determine whether integer-valued features are treated as categorical.

  • treat_all_integers_as_numerical (bool, default=False) – If True, treat all integer-typed columns as numerical regardless of cardinality.

  • degree (int, default=3) – Degree of polynomial or spline basis functions where applicable.

  • scaling_strategy (str, default="minmax") – Strategy for feature scaling (e.g., “standardization”, “minmax”, etc.).

  • n_knots (int, default=64) – Number of knots used in spline-based feature expansions.

  • use_decision_tree_knots (bool, default=True) – Whether to use decision tree-based knot placement for spline transformations.

  • knots_strategy (str, default="uniform") – Strategy for placing knots for splines (“uniform” or “quantile”).

  • spline_implementation (str, default="sklearn") – Which spline backend implementation to use (e.g., “sklearn”, “custom”).

  • min_unique_vals (int, default=5) – Minimum number of unique values required for a feature to be treated as numerical.

Examples

>>> from deeptab.models import MambularClassifier
>>> model = MambularClassifier(d_model=64, n_layers=8)
>>> model.fit(X_train, y_train)
>>> preds = model.predict(X_test)
>>> model.evaluate(X_test, y_test)
build_model(X, y, val_size=0.2, X_val=None, y_val=None, embeddings=None, embeddings_val=None, random_state=101, batch_size=128, shuffle=True, lr=None, lr_patience=None, lr_factor=None, weight_decay=None, train_metrics=None, val_metrics=None, dataloader_kwargs={})

Builds the model using the provided training data.

Parameters:
  • X (DataFrame or array-like, shape (n_samples, n_features)) – The training input samples.

  • y (array-like, shape (n_samples,) or (n_samples, n_targets)) – The target values (real numbers).

  • val_size (float) – The proportion of the dataset to include in the validation split if X_val is None. Ignored if X_val is provided.

  • X_val (DataFrame or array-like, shape (n_samples, n_features), optional) – The validation input samples. If provided, X and y are not split and this data is used for validation.

  • y_val (array-like, shape (n_samples,) or (n_samples, n_targets), optional) – The validation target values. Required if X_val is provided.

  • random_state (int) – Controls the shuffling applied to the data before applying the split.

  • batch_size (int) – Number of samples per gradient update.

  • shuffle (bool) – Whether to shuffle the training data before each epoch.

  • lr (float | None) – Learning rate for the optimizer.

  • lr_patience (int | None) – Number of epochs with no improvement on the validation loss to wait before reducing the learning rate.

  • lr_factor (float | None) – Factor by which the learning rate will be reduced.

  • train_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during training.

  • val_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during validation.

  • weight_decay (float | None) – Weight decay (L2 penalty) coefficient.

  • dataloader_kwargs (dict, default={}) – The kwargs for the pytorch dataloader class.

Returns:

self – The built classifier.

Return type:

object

encode(X, embeddings=None, batch_size=64)

Encodes input data using the trained model’s embedding layer.

Parameters:
  • X (array-like or DataFrame) – Input data to be encoded.

  • batch_size (int, optional, default=64) – Batch size for encoding.

Returns:

Encoded representations of the input data.

Return type:

torch.Tensor

Raises:

ValueError – If the model or data module is not fitted.

evaluate(X, y_true, embeddings=None, metrics=None)

Evaluate the model on the given data using specified metrics.

Parameters:
  • X (array-like or pd.DataFrame of shape (n_samples, n_features)) – The input samples to predict.

  • y_true (array-like of shape (n_samples,)) – The true class labels against which to evaluate the predictions.

  • embneddings (array-like or list of shape(n_samples, dimension)) – List or array with embeddings for unstructured data inputs

  • metrics (dict) – A dictionary where keys are metric names and values are tuples containing the metric function and a boolean indicating whether the metric requires probability scores (True) or class labels (False).

Returns:

scores – A dictionary with metric names as keys and their corresponding scores as values.

Return type:

dict

Notes

This method uses either the predict or predict_proba method depending on the metric requirements.

fit(X, y, val_size=0.2, X_val=None, y_val=None, embeddings=None, embeddings_val=None, max_epochs=100, random_state=101, batch_size=128, shuffle=True, patience=15, monitor='val_loss', mode='min', lr=None, lr_patience=None, lr_factor=None, weight_decay=None, checkpoint_path='model_checkpoints', train_metrics=None, val_metrics=None, dataloader_kwargs={}, rebuild=True, **trainer_kwargs)

Trains the classification model using the provided training data. Optionally, a separate validation set can be used.

Parameters:
  • X (DataFrame or array-like, shape (n_samples, n_features)) – The training input samples.

  • y (array-like, shape (n_samples,) or (n_samples, n_targets)) – The target values (real numbers).

  • val_size (float) – The proportion of the dataset to include in the validation split if X_val is None. Ignored if X_val is provided.

  • X_val (DataFrame or array-like, shape (n_samples, n_features), optional) – The validation input samples. If provided, X and y are not split and this data is used for validation.

  • y_val (array-like, shape (n_samples,) or (n_samples, n_targets), optional) – The validation target values. Required if X_val is provided.

  • max_epochs (int) – Maximum number of epochs for training.

  • random_state (int) – Controls the shuffling applied to the data before applying the split.

  • batch_size (int) – Number of samples per gradient update.

  • shuffle (bool) – Whether to shuffle the training data before each epoch.

  • patience (int) – Number of epochs with no improvement on the validation loss to wait before early stopping.

  • monitor (str) – The metric to monitor for early stopping.

  • mode (str) – Whether the monitored metric should be minimized (min) or maximized (max).

  • lr (float | None) – Learning rate for the optimizer.

  • lr_patience (int | None) – Number of epochs with no improvement on the validation loss to wait before reducing the learning rate.

  • factor (float, default=0.1) – Factor by which the learning rate will be reduced.

  • weight_decay (float | None) – Weight decay (L2 penalty) coefficient.

  • checkpoint_path (str, default="model_checkpoints") – Path where the checkpoints are being saved.

  • train_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during training.

  • val_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during validation.

  • dataloader_kwargs (dict, default={}) – The kwargs for the pytorch dataloader class.

  • rebuild (bool, default=True) – Whether to rebuild the model when it already was built.

  • **trainer_kwargs (Additional keyword arguments for PyTorch Lightning's Trainer class.)

Returns:

self – The fitted classifier.

Return type:

object

get_number_of_params(requires_grad=True)

Calculate the number of parameters in the model.

Parameters:

requires_grad (bool, optional) – If True, only count the parameters that require gradients (trainable parameters). If False, count all parameters. Default is True.

Returns:

The total number of parameters in the model.

Return type:

int

Raises:

ValueError – If the model has not been built prior to calling this method.

get_params(deep=True)

Get parameters for this estimator.

classmethod load(path)

Load and return a fitted model from path.

Parameters:

path (str) – Path to a file previously written by save().

Returns:

A fully reconstructed, ready-to-predict estimator of the same type that was saved.

Return type:

estimator

optimize_hparams(X, y, X_val=None, y_val=None, embeddings=None, embeddings_val=None, time=100, max_epochs=200, prune_by_epoch=True, prune_epoch=5, fixed_params={'cat_encoding': 'int', 'head_layer_size_length': 0, 'head_skip_layer': False, 'head_skip_layers': False, 'pooling_method': 'avg', 'use_cls': False}, custom_search_space=None, **optimize_kwargs)

Optimizes hyperparameters using Bayesian optimization with optional pruning.

Parameters:
  • X (array-like) – Training data.

  • y (array-like) – Training labels.

  • X_val (array-like, optional) – Validation data and labels.

  • y_val (array-like, optional) – Validation data and labels.

  • time (int) – The number of optimization trials to run.

  • max_epochs (int) – Maximum number of epochs for training.

  • prune_by_epoch (bool) – Whether to prune based on a specific epoch (True) or the best validation loss (False).

  • prune_epoch (int) – The specific epoch to prune by when prune_by_epoch is True.

  • **optimize_kwargs (dict) – Additional keyword arguments passed to the fit method.

Returns:

best_hparams – Best hyperparameters found during optimization.

Return type:

list

predict(X, embeddings=None, device=None)

Predicts target labels for the given input samples.

Parameters:

X (DataFrame or array-like, shape (n_samples, n_features)) – The input samples for which to predict target values.

Returns:

predictions – The predicted class labels.

Return type:

ndarray, shape (n_samples,)

predict_proba(X, embeddings=None, device=None)

Predicts class probabilities for the given input samples.

Parameters:

X (DataFrame or array-like, shape (n_samples, n_features)) – The input samples for which to predict class probabilities.

Returns:

probabilities – The predicted class probabilities.

Return type:

ndarray, shape (n_samples, n_classes)

pretrain(pretrain_epochs=15, k_neighbors=10, temperature=0.1, save_path='pretrained_embeddings.pth', lr=0.001, use_positive=True, use_negative=False, pool_sequence=True)

Pretrains the embedding layer of the model using a contrastive learning approach.

This method performs pretraining by optimizing the embeddings with respect to neighborhood structure in the feature space. The embeddings are saved after training.

Parameters:
  • pretrain_epochs (int, default=15) – Number of epochs to run pretraining.

  • k_neighbors (int, default=10) – Number of neighbors used in the contrastive loss computation.

  • temperature (float, default=0.1) – Temperature parameter for contrastive loss scaling.

  • save_path (str, default="pretrained_embeddings.pth") – Path to save the pretrained embeddings.

  • lr (float, default=1e-3) – Learning rate for the pretraining optimizer.

  • use_positive (bool, default=True) – Whether to include positive pairs in contrastive learning.

  • use_negative (bool, default=False) – Whether to include negative pairs in contrastive learning.

  • pool_sequence (bool, default=True) – Whether to apply sequence pooling before computing contrastive loss.

Raises:
  • ValueError – If the model has not been built before calling this method.

  • ValueError – If the model does not contain an embedding layer.

Notes

  • This function requires that self.build_model() has been called beforehand.

  • The pretraining method uses self.task_model.estimator.embedding_layer.

  • The method invokes super()._pretrain() with regression mode enabled.

save(path)

Save the fitted model to path.

The bundle written by this method can be restored with load(). It contains all state required for inference: the config, the fitted preprocessor, feature metadata, and the neural-network weights.

Parameters:

path (str) – Destination file path (e.g. "model.pt").

Raises:

ValueError – If the model has not been fitted yet.

Return type:

None

score(X, y, embeddings=None, metric=(sklearn.metrics.log_loss, True))

Calculate the score of the model using the specified metric.

Parameters:
  • X (array-like or pd.DataFrame of shape (n_samples, n_features)) – The input samples to predict.

  • y (array-like of shape (n_samples,)) – The true class labels against which to evaluate the predictions.

  • metric (tuple, default=(log_loss, True)) – A tuple containing the metric function and a boolean indicating whether the metric requires probability scores (True) or class labels (False).

Returns:

score – The score calculated using the specified metric.

Return type:

float

set_params(**parameters)

Set the parameters of this estimator.

class deeptab.models.MambularRegressor(*args: Any, **kwargs: Any)[source]

Mambular regressor. This class extends the SklearnBaseRegressor class and uses the Mambular model with the default Mambular configuration.

Notes

The parameters for this class include the attributes from the config dataclass as well as preprocessing arguments handled by the base class.

Configuration class for the Default Mambular model with predefined hyperparameters.

Parameters:
  • d_model (int, default=64) – Dimensionality of the model.

  • n_layers (int, default=4) – Number of layers in the model.

  • expand_factor (int, default=2) – Expansion factor for the feed-forward layers.

  • bias (bool, default=False) – Whether to use bias in the linear layers.

  • dropout (float, default=0.0) – Dropout rate for regularization.

  • d_conv (int, default=4) – Size of convolution over columns.

  • dilation (int, default=1) – Dilation factor for the convolution.

  • dt_rank (str, default="auto") – Rank of the decision tree used in the model.

  • d_state (int, default=128) – Dimensionality of the state in recurrent layers.

  • dt_scale (float, default=1.0) – Scaling factor for decision tree parameters.

  • dt_init (str, default="random") – Initialization method for decision tree parameters.

  • dt_max (float, default=0.1) – Maximum value for decision tree initialization.

  • dt_min (float, default=1e-04) – Minimum value for decision tree initialization.

  • dt_init_floor (float, default=1e-04) – Floor value for decision tree initialization.

  • norm (str, default="RMSNorm") – Type of normalization used (‘RMSNorm’, etc.).

  • activation (callable, default=nn.SiLU()) – Activation function for the model.

  • shuffle_embeddings (bool, default=False) – Whether to shuffle embeddings before being passed to Mamba layers.

  • head_layer_sizes (list, default=()) – Sizes of the layers in the model’s head.

  • head_dropout (float, default=0.5) – Dropout rate for the head layers.

  • head_skip_layers (bool, default=False) – Whether to skip layers in the head.

  • head_activation (callable, default=nn.SELU()) – Activation function for the head layers.

  • head_use_batch_norm (bool, default=False) – Whether to use batch normalization in the head layers.

  • pooling_method (str, default="avg") – Pooling method to use (‘avg’, ‘max’, etc.).

  • bidirectional (bool, default=False) – Whether to process data bidirectionally.

  • use_learnable_interaction (bool, default=False) – Whether to use learnable feature interactions before passing through Mamba blocks.

  • use_cls (bool, default=False) – Whether to append a CLS token to the input sequences.

  • use_pscan (bool, default=False) – Whether to use PSCAN for the state-space model.

  • mamba_version (str, default="mamba-torch") – Version of the Mamba model to use (‘mamba-torch’, ‘mamba1’, ‘mamba2’).

  • conv_bias (bool, default=False) – Whether to use a bias in the 1D convolution before each mamba block

  • AD_weight_decay (bool = True) – Whether to use weight decay als for the A and D matrices in Mamba

  • BC_layer_norm (bool = False) – Whether to use layer norm on the B and C matrices

  • feature_preprocessing (dict, optional) – Dictionary mapping feature names to specific preprocessing methods. Overrides global defaults.

  • n_bins (int, default=64) – Number of bins used for binning-based preprocessing (e.g., for discretizers or PLE).

  • numerical_preprocessing (str, default="ple") – Preprocessing method for numerical features (e.g., “standardization”, “minmax”, “ple”, “rbf”, etc.).

  • categorical_preprocessing (str, default="int") – Preprocessing method for categorical features (e.g., “int”, “ordinal”, “onehot”).

  • use_decision_tree_bins (bool, default=False) – Whether to use decision tree binning for numerical discretization.

  • binning_strategy (str, default="uniform") – Strategy for bin placement when not using tree-based methods. Options: “uniform”, “quantile”.

  • task (str, default="regression") – Problem type used to guide preprocessing (e.g., “regression” or “classification”).

  • cat_cutoff (float or int, default=0.03) – Threshold to determine whether integer-valued features are treated as categorical.

  • treat_all_integers_as_numerical (bool, default=False) – If True, treat all integer-typed columns as numerical regardless of cardinality.

  • degree (int, default=3) – Degree of polynomial or spline basis functions where applicable.

  • scaling_strategy (str, default="minmax") – Strategy for feature scaling (e.g., “standardization”, “minmax”, etc.).

  • n_knots (int, default=64) – Number of knots used in spline-based feature expansions.

  • use_decision_tree_knots (bool, default=True) – Whether to use decision tree-based knot placement for spline transformations.

  • knots_strategy (str, default="uniform") – Strategy for placing knots for splines (“uniform” or “quantile”).

  • spline_implementation (str, default="sklearn") – Which spline backend implementation to use (e.g., “sklearn”, “custom”).

  • min_unique_vals (int, default=5) – Minimum number of unique values required for a feature to be treated as numerical.

Examples

>>> from deeptab.models import MambularRegressor
>>> model = MambularRegressor(d_model=64, n_layers=8)
>>> model.fit(X_train, y_train)
>>> preds = model.predict(X_test)
>>> model.evaluate(X_test, y_test)
build_model(X, y, val_size=0.2, X_val=None, y_val=None, embeddings=None, embeddings_val=None, random_state=101, batch_size=128, shuffle=True, lr=None, lr_patience=None, lr_factor=None, weight_decay=None, train_metrics=None, val_metrics=None, dataloader_kwargs={})

Builds the model using the provided training data.

Parameters:
  • X (DataFrame or array-like, shape (n_samples, n_features)) – The training input samples.

  • y (array-like, shape (n_samples,) or (n_samples, n_targets)) – The target values (real numbers).

  • val_size (float) – The proportion of the dataset to include in the validation split if X_val is None. Ignored if X_val is provided.

  • X_val (DataFrame or array-like, shape (n_samples, n_features), optional) – The validation input samples. If provided, X and y are not split and this data is used for validation.

  • y_val (array-like, shape (n_samples,) or (n_samples, n_targets), optional) – The validation target values. Required if X_val is provided.

  • random_state (int) – Controls the shuffling applied to the data before applying the split.

  • batch_size (int) – Number of samples per gradient update.

  • shuffle (bool) – Whether to shuffle the training data before each epoch.

  • lr (float | None) – Learning rate for the optimizer.

  • lr_patience (int | None) – Number of epochs with no improvement on the validation loss to wait before reducing the learning rate.

  • factor (float, default=0.1) – Factor by which the learning rate will be reduced.

  • weight_decay (float | None) – Weight decay (L2 penalty) coefficient.

  • train_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during training.

  • val_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during validation.

  • dataloader_kwargs (dict, default={}) – The kwargs for the pytorch dataloader class.

Returns:

self – The built regressor.

Return type:

object

encode(X, embeddings=None, batch_size=64)

Encodes input data using the trained model’s embedding layer.

Parameters:
  • X (array-like or DataFrame) – Input data to be encoded.

  • batch_size (int, optional, default=64) – Batch size for encoding.

Returns:

Encoded representations of the input data.

Return type:

torch.Tensor

Raises:

ValueError – If the model or data module is not fitted.

evaluate(X, y_true, embeddings=None, metrics=None)

Evaluate the model on the given data using specified metrics.

Parameters:
  • X (array-like or pd.DataFrame of shape (n_samples, n_features)) – The input samples to predict.

  • y_true (array-like of shape (n_samples,) or (n_samples, n_outputs)) – The true target values against which to evaluate the predictions.

  • metrics (dict) – A dictionary where keys are metric names and values are the metric functions.

Notes

This method uses the predict method to generate predictions and computes each metric.

Returns:

scores – A dictionary with metric names as keys and their corresponding scores as values.

Return type:

dict

fit(X, y, val_size=0.2, X_val=None, y_val=None, embeddings=None, embeddings_val=None, max_epochs=100, random_state=101, batch_size=128, shuffle=True, patience=15, monitor='val_loss', mode='min', lr=None, lr_patience=None, lr_factor=None, weight_decay=None, checkpoint_path='model_checkpoints', dataloader_kwargs={}, train_metrics=None, val_metrics=None, rebuild=True, **trainer_kwargs)

Trains the regression model using the provided training data. Optionally, a separate validation set can be used.

Parameters:
  • X (DataFrame or array-like, shape (n_samples, n_features)) – The training input samples.

  • y (array-like, shape (n_samples,) or (n_samples, n_targets)) – The target values (real numbers).

  • val_size (float) – The proportion of the dataset to include in the validation split if X_val is None. Ignored if X_val is provided.

  • X_val (DataFrame or array-like, shape (n_samples, n_features), optional) – The validation input samples. If provided, X and y are not split and this data is used for validation.

  • y_val (array-like, shape (n_samples,) or (n_samples, n_targets), optional) – The validation target values. Required if X_val is provided.

  • max_epochs (int) – Maximum number of epochs for training.

  • random_state (int) – Controls the shuffling applied to the data before applying the split.

  • batch_size (int) – Number of samples per gradient update.

  • shuffle (bool) – Whether to shuffle the training data before each epoch.

  • patience (int) – Number of epochs with no improvement on the validation loss to wait before early stopping.

  • monitor (str) – The metric to monitor for early stopping.

  • mode (str) – Whether the monitored metric should be minimized (min) or maximized (max).

  • lr (float | None) – Learning rate for the optimizer.

  • lr_patience (int | None) – Number of epochs with no improvement on the validation loss to wait before reducing the learning rate.

  • factor (float, default=0.1) – Factor by which the learning rate will be reduced.

  • weight_decay (float | None) – Weight decay (L2 penalty) coefficient.

  • checkpoint_path (str, default="model_checkpoints") – Path where the checkpoints are being saved.

  • dataloader_kwargs (dict, default={}) – The kwargs for the pytorch dataloader class.

  • train_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during training.

  • val_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during validation.

  • rebuild (bool, default=True) – Whether to rebuild the model when it already was built.

  • **trainer_kwargs (Additional keyword arguments for PyTorch Lightning's Trainer class.)

Returns:

self – The fitted regressor.

Return type:

object

get_number_of_params(requires_grad=True)

Calculate the number of parameters in the model.

Parameters:

requires_grad (bool, optional) – If True, only count the parameters that require gradients (trainable parameters). If False, count all parameters. Default is True.

Returns:

The total number of parameters in the model.

Return type:

int

Raises:

ValueError – If the model has not been built prior to calling this method.

get_params(deep=True)

Get parameters for this estimator.

classmethod load(path)

Load and return a fitted model from path.

Parameters:

path (str) – Path to a file previously written by save().

Returns:

A fully reconstructed, ready-to-predict estimator of the same type that was saved.

Return type:

estimator

optimize_hparams(X, y, X_val=None, y_val=None, embeddings=None, embeddings_val=None, time=100, max_epochs=200, prune_by_epoch=True, prune_epoch=5, fixed_params={'cat_encoding': 'int', 'head_layer_size_length': 0, 'head_skip_layer': False, 'head_skip_layers': False, 'pooling_method': 'avg', 'use_cls': False}, custom_search_space=None, **optimize_kwargs)

Optimizes hyperparameters using Bayesian optimization with optional pruning.

Parameters:
  • X (array-like) – Training data.

  • y (array-like) – Training labels.

  • X_val (array-like, optional) – Validation data and labels.

  • y_val (array-like, optional) – Validation data and labels.

  • time (int) – The number of optimization trials to run.

  • max_epochs (int) – Maximum number of epochs for training.

  • prune_by_epoch (bool) – Whether to prune based on a specific epoch (True) or the best validation loss (False).

  • prune_epoch (int) – The specific epoch to prune by when prune_by_epoch is True.

  • **optimize_kwargs (dict) – Additional keyword arguments passed to the fit method.

Returns:

best_hparams – Best hyperparameters found during optimization.

Return type:

list

predict(X, embeddings=None, device=None)

Predicts target values for the given input samples.

Parameters:

X (DataFrame or array-like, shape (n_samples, n_features)) – The input samples for which to predict target values.

Returns:

predictions – The predicted target values.

Return type:

ndarray, shape (n_samples,) or (n_samples, n_outputs)

pretrain(pretrain_epochs=15, k_neighbors=10, temperature=0.1, save_path='pretrained_embeddings.pth', lr=0.001, use_positive=True, use_negative=False, pool_sequence=True)

Pretrains the embedding layer of the model using a contrastive learning approach.

This method performs pretraining by optimizing the embeddings with respect to neighborhood structure in the feature space. The embeddings are saved after training.

Parameters:
  • pretrain_epochs (int, default=15) – Number of epochs to run pretraining.

  • k_neighbors (int, default=10) – Number of neighbors used in the contrastive loss computation.

  • temperature (float, default=0.1) – Temperature parameter for contrastive loss scaling.

  • save_path (str, default="pretrained_embeddings.pth") – Path to save the pretrained embeddings.

  • lr (float, default=1e-3) – Learning rate for the pretraining optimizer.

  • use_positive (bool, default=True) – Whether to include positive pairs in contrastive learning.

  • use_negative (bool, default=False) – Whether to include negative pairs in contrastive learning.

  • pool_sequence (bool, default=True) – Whether to apply sequence pooling before computing contrastive loss.

Raises:
  • ValueError – If the model has not been built before calling this method.

  • ValueError – If the model does not contain an embedding layer.

Notes

  • This function requires that self.build_model() has been called beforehand.

  • The pretraining method uses self.task_model.estimator.embedding_layer.

  • The method invokes super()._pretrain() with regression mode enabled.

save(path)

Save the fitted model to path.

The bundle written by this method can be restored with load(). It contains all state required for inference: the config, the fitted preprocessor, feature metadata, and the neural-network weights.

Parameters:

path (str) – Destination file path (e.g. "model.pt").

Raises:

ValueError – If the model has not been fitted yet.

Return type:

None

score(X, y, embeddings=None, metric=sklearn.metrics.mean_squared_error)

Calculate the score of the model using the specified metric.

Parameters:
  • X (array-like or pd.DataFrame of shape (n_samples, n_features)) – The input samples to predict.

  • y (array-like of shape (n_samples,) or (n_samples, n_outputs)) – The true target values against which to evaluate the predictions.

  • metric (callable, default=mean_squared_error) – The metric function to use for evaluation. Must be a callable with the signature metric(y_true, y_pred).

Returns:

score – The score calculated using the specified metric.

Return type:

float

set_params(**parameters)

Set the parameters of this estimator.

class deeptab.models.MambularLSS(*args: Any, **kwargs: Any)[source]

Mambular LSS for distributional regression. This class extends the SklearnBaseLSS class and uses the Mambular model with the default Mambular configuration.

Notes

The parameters for this class include the attributes from the config dataclass as well as preprocessing arguments handled by the base class.

Configuration class for the Default Mambular model with predefined hyperparameters.

Parameters:
  • d_model (int, default=64) – Dimensionality of the model.

  • n_layers (int, default=4) – Number of layers in the model.

  • expand_factor (int, default=2) – Expansion factor for the feed-forward layers.

  • bias (bool, default=False) – Whether to use bias in the linear layers.

  • dropout (float, default=0.0) – Dropout rate for regularization.

  • d_conv (int, default=4) – Size of convolution over columns.

  • dilation (int, default=1) – Dilation factor for the convolution.

  • dt_rank (str, default="auto") – Rank of the decision tree used in the model.

  • d_state (int, default=128) – Dimensionality of the state in recurrent layers.

  • dt_scale (float, default=1.0) – Scaling factor for decision tree parameters.

  • dt_init (str, default="random") – Initialization method for decision tree parameters.

  • dt_max (float, default=0.1) – Maximum value for decision tree initialization.

  • dt_min (float, default=1e-04) – Minimum value for decision tree initialization.

  • dt_init_floor (float, default=1e-04) – Floor value for decision tree initialization.

  • norm (str, default="RMSNorm") – Type of normalization used (‘RMSNorm’, etc.).

  • activation (callable, default=nn.SiLU()) – Activation function for the model.

  • shuffle_embeddings (bool, default=False) – Whether to shuffle embeddings before being passed to Mamba layers.

  • head_layer_sizes (list, default=()) – Sizes of the layers in the model’s head.

  • head_dropout (float, default=0.5) – Dropout rate for the head layers.

  • head_skip_layers (bool, default=False) – Whether to skip layers in the head.

  • head_activation (callable, default=nn.SELU()) – Activation function for the head layers.

  • head_use_batch_norm (bool, default=False) – Whether to use batch normalization in the head layers.

  • pooling_method (str, default="avg") – Pooling method to use (‘avg’, ‘max’, etc.).

  • bidirectional (bool, default=False) – Whether to process data bidirectionally.

  • use_learnable_interaction (bool, default=False) – Whether to use learnable feature interactions before passing through Mamba blocks.

  • use_cls (bool, default=False) – Whether to append a CLS token to the input sequences.

  • use_pscan (bool, default=False) – Whether to use PSCAN for the state-space model.

  • mamba_version (str, default="mamba-torch") – Version of the Mamba model to use (‘mamba-torch’, ‘mamba1’, ‘mamba2’).

  • conv_bias (bool, default=False) – Whether to use a bias in the 1D convolution before each mamba block

  • AD_weight_decay (bool = True) – Whether to use weight decay als for the A and D matrices in Mamba

  • BC_layer_norm (bool = False) – Whether to use layer norm on the B and C matrices

  • feature_preprocessing (dict, optional) – Dictionary mapping feature names to specific preprocessing methods. Overrides global defaults.

  • n_bins (int, default=64) – Number of bins used for binning-based preprocessing (e.g., for discretizers or PLE).

  • numerical_preprocessing (str, default="ple") – Preprocessing method for numerical features (e.g., “standardization”, “minmax”, “ple”, “rbf”, etc.).

  • categorical_preprocessing (str, default="int") – Preprocessing method for categorical features (e.g., “int”, “ordinal”, “onehot”).

  • use_decision_tree_bins (bool, default=False) – Whether to use decision tree binning for numerical discretization.

  • binning_strategy (str, default="uniform") – Strategy for bin placement when not using tree-based methods. Options: “uniform”, “quantile”.

  • task (str, default="regression") – Problem type used to guide preprocessing (e.g., “regression” or “classification”).

  • cat_cutoff (float or int, default=0.03) – Threshold to determine whether integer-valued features are treated as categorical.

  • treat_all_integers_as_numerical (bool, default=False) – If True, treat all integer-typed columns as numerical regardless of cardinality.

  • degree (int, default=3) – Degree of polynomial or spline basis functions where applicable.

  • scaling_strategy (str, default="minmax") – Strategy for feature scaling (e.g., “standardization”, “minmax”, etc.).

  • n_knots (int, default=64) – Number of knots used in spline-based feature expansions.

  • use_decision_tree_knots (bool, default=True) – Whether to use decision tree-based knot placement for spline transformations.

  • knots_strategy (str, default="uniform") – Strategy for placing knots for splines (“uniform” or “quantile”).

  • spline_implementation (str, default="sklearn") – Which spline backend implementation to use (e.g., “sklearn”, “custom”).

  • min_unique_vals (int, default=5) – Minimum number of unique values required for a feature to be treated as numerical.

Examples

>>> from deeptab.models import MambularLSS
>>> model = MambularLSS(d_model=64, n_layers=8)
>>> model.fit(X_train, y_train, family='normal')
>>> preds = model.predict(X_test)
>>> model.evaluate(X_test, y_test)
build_model(X, y, val_size=0.2, X_val=None, y_val=None, random_state=101, batch_size=128, shuffle=True, lr=None, lr_patience=None, lr_factor=None, weight_decay=None, train_metrics=None, val_metrics=None, dataloader_kwargs={})

Builds the model using the provided training data.

Parameters:
  • X (DataFrame or array-like, shape (n_samples, n_features)) – The training input samples.

  • y (array-like, shape (n_samples,) or (n_samples, n_targets)) – The target values (real numbers).

  • val_size (float) – The proportion of the dataset to include in the validation split if X_val is None. Ignored if X_val is provided.

  • X_val (DataFrame or array-like, shape (n_samples, n_features), optional) – The validation input samples. If provided, X and y are not split and this data is used for validation.

  • y_val (array-like, shape (n_samples,) or (n_samples, n_targets), optional) – The validation target values. Required if X_val is provided.

  • random_state (int) – Controls the shuffling applied to the data before applying the split.

  • batch_size (int) – Number of samples per gradient update.

  • shuffle (bool) – Whether to shuffle the training data before each epoch.

  • lr (float | None) – Learning rate for the optimizer.

  • lr_patience (int | None) – Number of epochs with no improvement on the validation loss to wait before reducing the learning rate.

  • lr_factor (float | None) – Factor by which the learning rate will be reduced.

  • train_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during training.

  • val_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during validation.

  • weight_decay (float | None) – Weight decay (L2 penalty) coefficient.

  • dataloader_kwargs (dict, default={}) – The kwargs for the pytorch dataloader class.

Returns:

self – The built distributional regressor.

Return type:

object

encode(X, batch_size=64)

Encodes input data using the trained model’s embedding layer.

Parameters:
  • X (array-like or DataFrame) – Input data to be encoded.

  • batch_size (int, optional, default=64) – Batch size for encoding.

Returns:

Encoded representations of the input data.

Return type:

torch.Tensor

Raises:

ValueError – If the model or data module is not fitted.

evaluate(X, y_true, metrics=None, distribution_family=None)

Evaluate the model on the given data using specified metrics.

Parameters:
  • X (array-like or pd.DataFrame of shape (n_samples, n_features)) – The input samples to predict.

  • y_true (array-like of shape (n_samples,)) – The true class labels against which to evaluate the predictions.

  • metrics (dict) – A dictionary where keys are metric names and values are tuples containing the metric function and a boolean indicating whether the metric requires probability scores (True) or class labels (False).

  • distribution_family (str, optional) – Specifies the distribution family the model is predicting for. If None, it will attempt to infer based on the model’s settings.

Returns:

scores – A dictionary with metric names as keys and their corresponding scores as values.

Return type:

dict

Notes

This method uses either the predict or predict_proba method depending on the metric requirements.

fit(X, y, family, val_size=0.2, X_val=None, y_val=None, max_epochs=100, random_state=101, batch_size=128, shuffle=True, patience=15, monitor='val_loss', mode='min', lr=None, lr_patience=None, lr_factor=None, weight_decay=None, checkpoint_path='model_checkpoints', distributional_kwargs=None, train_metrics=None, val_metrics=None, dataloader_kwargs={}, rebuild=True, **trainer_kwargs)

Trains the regression model using the provided training data. Optionally, a separate validation set can be used.

Parameters:
  • X (DataFrame or array-like, shape (n_samples, n_features)) – The training input samples.

  • y (array-like, shape (n_samples,) or (n_samples, n_targets)) – The target values (real numbers).

  • family (str) – The name of the distribution family to use for the loss function. Examples include ‘normal’ for regression tasks.

  • val_size (float) – The proportion of the dataset to include in the validation split if X_val is None. Ignored if X_val is provided.

  • X_val (DataFrame or array-like, shape (n_samples, n_features), optional) – The validation input samples. If provided, X and y are not split and this data is used for validation.

  • y_val (array-like, shape (n_samples,) or (n_samples, n_targets), optional) – The validation target values. Required if X_val is provided.

  • max_epochs (int) – Maximum number of epochs for training.

  • random_state (int) – Controls the shuffling applied to the data before applying the split.

  • batch_size (int) – Number of samples per gradient update.

  • shuffle (bool) – Whether to shuffle the training data before each epoch.

  • patience (int) – Number of epochs with no improvement on the validation loss to wait before early stopping.

  • monitor (str) – The metric to monitor for early stopping.

  • mode (str) – Whether the monitored metric should be minimized (min) or maximized (max).

  • lr (float | None) – Learning rate for the optimizer.

  • lr_patience (int | None) – Number of epochs with no improvement on the validation loss to wait before reducing the learning rate.

  • factor (float, default=0.1) – Factor by which the learning rate will be reduced.

  • weight_decay (float | None) – Weight decay (L2 penalty) coefficient.

  • distributional_kwargs (dict, default=None) – any arguments taht are specific for a certain distribution.

  • train_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during training.

  • val_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during validation.

  • checkpoint_path (str, default="model_checkpoints") – Path where the checkpoints are being saved.

  • dataloader_kwargs (dict, default={}) – The kwargs for the pytorch dataloader class.

  • **trainer_kwargs (Additional keyword arguments for PyTorch Lightning's Trainer class.)

Returns:

self – The fitted regressor.

Return type:

object

get_default_metrics(distribution_family)

Provides default metrics based on the distribution family.

Parameters:

distribution_family (str) – The distribution family for which to provide default metrics.

Returns:

metrics – A dictionary of default metric functions.

Return type:

dict

get_number_of_params(requires_grad=True)

Calculate the number of parameters in the model.

Parameters:

requires_grad (bool, optional) – If True, only count the parameters that require gradients (trainable parameters). If False, count all parameters. Default is True.

Returns:

The total number of parameters in the model.

Return type:

int

Raises:

ValueError – If the model has not been built prior to calling this method.

get_params(deep=True)

Get parameters for this estimator.

Parameters:

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

params – Parameter names mapped to their values.

Return type:

dict

classmethod load(path)

Load and return a fitted model from path.

Parameters:

path (str) – Path to a file previously written by save().

Returns:

A fully reconstructed, ready-to-predict estimator.

Return type:

estimator

optimize_hparams(X, y, X_val=None, y_val=None, time=100, max_epochs=200, prune_by_epoch=True, prune_epoch=5, fixed_params={'cat_encoding': 'int', 'head_layer_size_length': 0, 'head_skip_layer': False, 'head_skip_layers': False, 'pooling_method': 'avg', 'use_cls': False}, custom_search_space=None, **optimize_kwargs)

Optimizes hyperparameters using Bayesian optimization with optional pruning.

Parameters:
  • X (array-like) – Training data.

  • y (array-like) – Training labels.

  • X_val (array-like, optional) – Validation data and labels.

  • y_val (array-like, optional) – Validation data and labels.

  • time (int) – The number of optimization trials to run.

  • max_epochs (int) – Maximum number of epochs for training.

  • prune_by_epoch (bool) – Whether to prune based on a specific epoch (True) or the best validation loss (False).

  • prune_epoch (int) – The specific epoch to prune by when prune_by_epoch is True.

  • **optimize_kwargs (dict) – Additional keyword arguments passed to the fit method.

Returns:

best_hparams – Best hyperparameters found during optimization.

Return type:

list

predict(X, raw=False, device=None)

Predicts target values for the given input samples.

Parameters:

X (DataFrame or array-like, shape (n_samples, n_features)) – The input samples for which to predict target values.

Returns:

predictions – The predicted target values.

Return type:

ndarray, shape (n_samples,) or (n_samples, n_outputs)

save(path)

Save the fitted model to path.

Parameters:

path (str) – Destination file path (e.g. "model.pt").

Raises:

ValueError – If the model has not been fitted yet.

Return type:

None

score(X, y, metric='NLL')

Calculate the score of the model using the specified metric.

Parameters:
  • X (array-like or pd.DataFrame of shape (n_samples, n_features)) – The input samples to predict.

  • y (array-like of shape (n_samples,) or (n_samples, n_outputs)) – The true target values against which to evaluate the predictions.

  • metric (str, default="NLL") – So far, only negative log-likelihood is supported

Returns:

score – The score calculated using the specified metric.

Return type:

float

set_params(**parameters)

Set the parameters of this estimator.

Parameters:

**parameters (dict) – Estimator parameters.

Returns:

self – Estimator instance.

Return type:

object

class deeptab.models.FTTransformerClassifier(*args: Any, **kwargs: Any)[source]
FTTransformer Classifier. This class extends the SklearnBaseClassifier class

and uses the FTTransformer model with the default FTTransformer configuration.

Notes

The parameters for this class include the attributes from the config dataclass as well as preprocessing arguments handled by the base class.

Configuration class for the FT Transformer model with predefined hyperparameters.

Parameters:
  • d_model (int, default=128) – Dimensionality of the transformer model.

  • n_layers (int, default=4) – Number of transformer layers.

  • n_heads (int, default=8) – Number of attention heads in the transformer.

  • attn_dropout (float, default=0.2) – Dropout rate for the attention mechanism.

  • ff_dropout (float, default=0.1) – Dropout rate for the feed-forward layers.

  • norm (str, default="LayerNorm") – Type of normalization to be used (‘LayerNorm’, ‘RMSNorm’, etc.).

  • activation (callable, default=nn.SELU()) – Activation function for the transformer layers.

  • transformer_activation (callable, default=ReGLU()) – Activation function for the transformer feed-forward layers.

  • transformer_dim_feedforward (int, default=256) – Dimensionality of the feed-forward layers in the transformer.

  • layer_norm_eps (float, default=1e-05) – Epsilon value for layer normalization to improve numerical stability.

  • norm_first (bool, default=False) – Whether to apply normalization before other operations in each transformer block.

  • bias (bool, default=True) – Whether to use bias in linear layers.

  • head_layer_sizes (list, default=()) – Sizes of the fully connected layers in the model’s head.

  • head_dropout (float, default=0.5) – Dropout rate for the head layers.

  • head_skip_layers (bool, default=False) – Whether to use skip connections in the head layers.

  • head_activation (callable, default=nn.SELU()) – Activation function for the head layers.

  • head_use_batch_norm (bool, default=False) – Whether to use batch normalization in the head layers.

  • pooling_method (str, default="avg") – Pooling method to be used (‘cls’, ‘avg’, etc.).

  • use_cls (bool, default=False) – Whether to use a CLS token for pooling.

  • cat_encoding (str, default="int") – Method for encoding categorical features (‘int’, ‘one-hot’, or ‘linear’).

  • feature_preprocessing (dict, optional) – Dictionary mapping feature names to specific preprocessing methods. Overrides global defaults.

  • n_bins (int, default=64) – Number of bins used for binning-based preprocessing (e.g., for discretizers or PLE).

  • numerical_preprocessing (str, default="ple") – Preprocessing method for numerical features (e.g., “standardization”, “minmax”, “ple”, “rbf”, etc.).

  • categorical_preprocessing (str, default="int") – Preprocessing method for categorical features (e.g., “int”, “ordinal”, “onehot”).

  • use_decision_tree_bins (bool, default=False) – Whether to use decision tree binning for numerical discretization.

  • binning_strategy (str, default="uniform") – Strategy for bin placement when not using tree-based methods. Options: “uniform”, “quantile”.

  • task (str, default="regression") – Problem type used to guide preprocessing (e.g., “regression” or “classification”).

  • cat_cutoff (float or int, default=0.03) – Threshold to determine whether integer-valued features are treated as categorical.

  • treat_all_integers_as_numerical (bool, default=False) – If True, treat all integer-typed columns as numerical regardless of cardinality.

  • degree (int, default=3) – Degree of polynomial or spline basis functions where applicable.

  • scaling_strategy (str, default="minmax") – Strategy for feature scaling (e.g., “standardization”, “minmax”, etc.).

  • n_knots (int, default=64) – Number of knots used in spline-based feature expansions.

  • use_decision_tree_knots (bool, default=True) – Whether to use decision tree-based knot placement for spline transformations.

  • knots_strategy (str, default="uniform") – Strategy for placing knots for splines (“uniform” or “quantile”).

  • spline_implementation (str, default="sklearn") – Which spline backend implementation to use (e.g., “sklearn”, “custom”).

  • min_unique_vals (int, default=5) – Minimum number of unique values required for a feature to be treated as numerical.

Examples

>>> from deeptab.models import FTTransformerClassifier
>>> model = FTTransformerClassifier(d_model=64, n_layers=8)
>>> model.fit(X_train, y_train)
>>> preds = model.predict(X_test)
>>> model.evaluate(X_test, y_test)
build_model(X, y, val_size=0.2, X_val=None, y_val=None, embeddings=None, embeddings_val=None, random_state=101, batch_size=128, shuffle=True, lr=None, lr_patience=None, lr_factor=None, weight_decay=None, train_metrics=None, val_metrics=None, dataloader_kwargs={})

Builds the model using the provided training data.

Parameters:
  • X (DataFrame or array-like, shape (n_samples, n_features)) – The training input samples.

  • y (array-like, shape (n_samples,) or (n_samples, n_targets)) – The target values (real numbers).

  • val_size (float) – The proportion of the dataset to include in the validation split if X_val is None. Ignored if X_val is provided.

  • X_val (DataFrame or array-like, shape (n_samples, n_features), optional) – The validation input samples. If provided, X and y are not split and this data is used for validation.

  • y_val (array-like, shape (n_samples,) or (n_samples, n_targets), optional) – The validation target values. Required if X_val is provided.

  • random_state (int) – Controls the shuffling applied to the data before applying the split.

  • batch_size (int) – Number of samples per gradient update.

  • shuffle (bool) – Whether to shuffle the training data before each epoch.

  • lr (float | None) – Learning rate for the optimizer.

  • lr_patience (int | None) – Number of epochs with no improvement on the validation loss to wait before reducing the learning rate.

  • lr_factor (float | None) – Factor by which the learning rate will be reduced.

  • train_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during training.

  • val_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during validation.

  • weight_decay (float | None) – Weight decay (L2 penalty) coefficient.

  • dataloader_kwargs (dict, default={}) – The kwargs for the pytorch dataloader class.

Returns:

self – The built classifier.

Return type:

object

encode(X, embeddings=None, batch_size=64)

Encodes input data using the trained model’s embedding layer.

Parameters:
  • X (array-like or DataFrame) – Input data to be encoded.

  • batch_size (int, optional, default=64) – Batch size for encoding.

Returns:

Encoded representations of the input data.

Return type:

torch.Tensor

Raises:

ValueError – If the model or data module is not fitted.

evaluate(X, y_true, embeddings=None, metrics=None)

Evaluate the model on the given data using specified metrics.

Parameters:
  • X (array-like or pd.DataFrame of shape (n_samples, n_features)) – The input samples to predict.

  • y_true (array-like of shape (n_samples,)) – The true class labels against which to evaluate the predictions.

  • embneddings (array-like or list of shape(n_samples, dimension)) – List or array with embeddings for unstructured data inputs

  • metrics (dict) – A dictionary where keys are metric names and values are tuples containing the metric function and a boolean indicating whether the metric requires probability scores (True) or class labels (False).

Returns:

scores – A dictionary with metric names as keys and their corresponding scores as values.

Return type:

dict

Notes

This method uses either the predict or predict_proba method depending on the metric requirements.

fit(X, y, val_size=0.2, X_val=None, y_val=None, embeddings=None, embeddings_val=None, max_epochs=100, random_state=101, batch_size=128, shuffle=True, patience=15, monitor='val_loss', mode='min', lr=None, lr_patience=None, lr_factor=None, weight_decay=None, checkpoint_path='model_checkpoints', train_metrics=None, val_metrics=None, dataloader_kwargs={}, rebuild=True, **trainer_kwargs)

Trains the classification model using the provided training data. Optionally, a separate validation set can be used.

Parameters:
  • X (DataFrame or array-like, shape (n_samples, n_features)) – The training input samples.

  • y (array-like, shape (n_samples,) or (n_samples, n_targets)) – The target values (real numbers).

  • val_size (float) – The proportion of the dataset to include in the validation split if X_val is None. Ignored if X_val is provided.

  • X_val (DataFrame or array-like, shape (n_samples, n_features), optional) – The validation input samples. If provided, X and y are not split and this data is used for validation.

  • y_val (array-like, shape (n_samples,) or (n_samples, n_targets), optional) – The validation target values. Required if X_val is provided.

  • max_epochs (int) – Maximum number of epochs for training.

  • random_state (int) – Controls the shuffling applied to the data before applying the split.

  • batch_size (int) – Number of samples per gradient update.

  • shuffle (bool) – Whether to shuffle the training data before each epoch.

  • patience (int) – Number of epochs with no improvement on the validation loss to wait before early stopping.

  • monitor (str) – The metric to monitor for early stopping.

  • mode (str) – Whether the monitored metric should be minimized (min) or maximized (max).

  • lr (float | None) – Learning rate for the optimizer.

  • lr_patience (int | None) – Number of epochs with no improvement on the validation loss to wait before reducing the learning rate.

  • factor (float, default=0.1) – Factor by which the learning rate will be reduced.

  • weight_decay (float | None) – Weight decay (L2 penalty) coefficient.

  • checkpoint_path (str, default="model_checkpoints") – Path where the checkpoints are being saved.

  • train_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during training.

  • val_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during validation.

  • dataloader_kwargs (dict, default={}) – The kwargs for the pytorch dataloader class.

  • rebuild (bool, default=True) – Whether to rebuild the model when it already was built.

  • **trainer_kwargs (Additional keyword arguments for PyTorch Lightning's Trainer class.)

Returns:

self – The fitted classifier.

Return type:

object

get_number_of_params(requires_grad=True)

Calculate the number of parameters in the model.

Parameters:

requires_grad (bool, optional) – If True, only count the parameters that require gradients (trainable parameters). If False, count all parameters. Default is True.

Returns:

The total number of parameters in the model.

Return type:

int

Raises:

ValueError – If the model has not been built prior to calling this method.

get_params(deep=True)

Get parameters for this estimator.

classmethod load(path)

Load and return a fitted model from path.

Parameters:

path (str) – Path to a file previously written by save().

Returns:

A fully reconstructed, ready-to-predict estimator of the same type that was saved.

Return type:

estimator

optimize_hparams(X, y, X_val=None, y_val=None, embeddings=None, embeddings_val=None, time=100, max_epochs=200, prune_by_epoch=True, prune_epoch=5, fixed_params={'cat_encoding': 'int', 'head_layer_size_length': 0, 'head_skip_layer': False, 'head_skip_layers': False, 'pooling_method': 'avg', 'use_cls': False}, custom_search_space=None, **optimize_kwargs)

Optimizes hyperparameters using Bayesian optimization with optional pruning.

Parameters:
  • X (array-like) – Training data.

  • y (array-like) – Training labels.

  • X_val (array-like, optional) – Validation data and labels.

  • y_val (array-like, optional) – Validation data and labels.

  • time (int) – The number of optimization trials to run.

  • max_epochs (int) – Maximum number of epochs for training.

  • prune_by_epoch (bool) – Whether to prune based on a specific epoch (True) or the best validation loss (False).

  • prune_epoch (int) – The specific epoch to prune by when prune_by_epoch is True.

  • **optimize_kwargs (dict) – Additional keyword arguments passed to the fit method.

Returns:

best_hparams – Best hyperparameters found during optimization.

Return type:

list

predict(X, embeddings=None, device=None)

Predicts target labels for the given input samples.

Parameters:

X (DataFrame or array-like, shape (n_samples, n_features)) – The input samples for which to predict target values.

Returns:

predictions – The predicted class labels.

Return type:

ndarray, shape (n_samples,)

predict_proba(X, embeddings=None, device=None)

Predicts class probabilities for the given input samples.

Parameters:

X (DataFrame or array-like, shape (n_samples, n_features)) – The input samples for which to predict class probabilities.

Returns:

probabilities – The predicted class probabilities.

Return type:

ndarray, shape (n_samples, n_classes)

pretrain(pretrain_epochs=15, k_neighbors=10, temperature=0.1, save_path='pretrained_embeddings.pth', lr=0.001, use_positive=True, use_negative=False, pool_sequence=True)

Pretrains the embedding layer of the model using a contrastive learning approach.

This method performs pretraining by optimizing the embeddings with respect to neighborhood structure in the feature space. The embeddings are saved after training.

Parameters:
  • pretrain_epochs (int, default=15) – Number of epochs to run pretraining.

  • k_neighbors (int, default=10) – Number of neighbors used in the contrastive loss computation.

  • temperature (float, default=0.1) – Temperature parameter for contrastive loss scaling.

  • save_path (str, default="pretrained_embeddings.pth") – Path to save the pretrained embeddings.

  • lr (float, default=1e-3) – Learning rate for the pretraining optimizer.

  • use_positive (bool, default=True) – Whether to include positive pairs in contrastive learning.

  • use_negative (bool, default=False) – Whether to include negative pairs in contrastive learning.

  • pool_sequence (bool, default=True) – Whether to apply sequence pooling before computing contrastive loss.

Raises:
  • ValueError – If the model has not been built before calling this method.

  • ValueError – If the model does not contain an embedding layer.

Notes

  • This function requires that self.build_model() has been called beforehand.

  • The pretraining method uses self.task_model.estimator.embedding_layer.

  • The method invokes super()._pretrain() with regression mode enabled.

save(path)

Save the fitted model to path.

The bundle written by this method can be restored with load(). It contains all state required for inference: the config, the fitted preprocessor, feature metadata, and the neural-network weights.

Parameters:

path (str) – Destination file path (e.g. "model.pt").

Raises:

ValueError – If the model has not been fitted yet.

Return type:

None

score(X, y, embeddings=None, metric=(sklearn.metrics.log_loss, True))

Calculate the score of the model using the specified metric.

Parameters:
  • X (array-like or pd.DataFrame of shape (n_samples, n_features)) – The input samples to predict.

  • y (array-like of shape (n_samples,)) – The true class labels against which to evaluate the predictions.

  • metric (tuple, default=(log_loss, True)) – A tuple containing the metric function and a boolean indicating whether the metric requires probability scores (True) or class labels (False).

Returns:

score – The score calculated using the specified metric.

Return type:

float

set_params(**parameters)

Set the parameters of this estimator.

class deeptab.models.FTTransformerRegressor(*args: Any, **kwargs: Any)[source]

FTTransformer regressor. This class extends the SklearnBaseRegressor class and uses the FTTransformer model with the default FTTransformer configuration.

Notes

The parameters for this class include the attributes from the config dataclass as well as preprocessing arguments handled by the base class.

Configuration class for the FT Transformer model with predefined hyperparameters.

Parameters:
  • d_model (int, default=128) – Dimensionality of the transformer model.

  • n_layers (int, default=4) – Number of transformer layers.

  • n_heads (int, default=8) – Number of attention heads in the transformer.

  • attn_dropout (float, default=0.2) – Dropout rate for the attention mechanism.

  • ff_dropout (float, default=0.1) – Dropout rate for the feed-forward layers.

  • norm (str, default="LayerNorm") – Type of normalization to be used (‘LayerNorm’, ‘RMSNorm’, etc.).

  • activation (callable, default=nn.SELU()) – Activation function for the transformer layers.

  • transformer_activation (callable, default=ReGLU()) – Activation function for the transformer feed-forward layers.

  • transformer_dim_feedforward (int, default=256) – Dimensionality of the feed-forward layers in the transformer.

  • layer_norm_eps (float, default=1e-05) – Epsilon value for layer normalization to improve numerical stability.

  • norm_first (bool, default=False) – Whether to apply normalization before other operations in each transformer block.

  • bias (bool, default=True) – Whether to use bias in linear layers.

  • head_layer_sizes (list, default=()) – Sizes of the fully connected layers in the model’s head.

  • head_dropout (float, default=0.5) – Dropout rate for the head layers.

  • head_skip_layers (bool, default=False) – Whether to use skip connections in the head layers.

  • head_activation (callable, default=nn.SELU()) – Activation function for the head layers.

  • head_use_batch_norm (bool, default=False) – Whether to use batch normalization in the head layers.

  • pooling_method (str, default="avg") – Pooling method to be used (‘cls’, ‘avg’, etc.).

  • use_cls (bool, default=False) – Whether to use a CLS token for pooling.

  • cat_encoding (str, default="int") – Method for encoding categorical features (‘int’, ‘one-hot’, or ‘linear’).

  • feature_preprocessing (dict, optional) – Dictionary mapping feature names to specific preprocessing methods. Overrides global defaults.

  • n_bins (int, default=64) – Number of bins used for binning-based preprocessing (e.g., for discretizers or PLE).

  • numerical_preprocessing (str, default="ple") – Preprocessing method for numerical features (e.g., “standardization”, “minmax”, “ple”, “rbf”, etc.).

  • categorical_preprocessing (str, default="int") – Preprocessing method for categorical features (e.g., “int”, “ordinal”, “onehot”).

  • use_decision_tree_bins (bool, default=False) – Whether to use decision tree binning for numerical discretization.

  • binning_strategy (str, default="uniform") – Strategy for bin placement when not using tree-based methods. Options: “uniform”, “quantile”.

  • task (str, default="regression") – Problem type used to guide preprocessing (e.g., “regression” or “classification”).

  • cat_cutoff (float or int, default=0.03) – Threshold to determine whether integer-valued features are treated as categorical.

  • treat_all_integers_as_numerical (bool, default=False) – If True, treat all integer-typed columns as numerical regardless of cardinality.

  • degree (int, default=3) – Degree of polynomial or spline basis functions where applicable.

  • scaling_strategy (str, default="minmax") – Strategy for feature scaling (e.g., “standardization”, “minmax”, etc.).

  • n_knots (int, default=64) – Number of knots used in spline-based feature expansions.

  • use_decision_tree_knots (bool, default=True) – Whether to use decision tree-based knot placement for spline transformations.

  • knots_strategy (str, default="uniform") – Strategy for placing knots for splines (“uniform” or “quantile”).

  • spline_implementation (str, default="sklearn") – Which spline backend implementation to use (e.g., “sklearn”, “custom”).

  • min_unique_vals (int, default=5) – Minimum number of unique values required for a feature to be treated as numerical.

Examples

>>> from deeptab.models import FTTransformerRegressor
>>> model = FTTransformerRegressor(d_model=64, n_layers=8)
>>> model.fit(X_train, y_train)
>>> preds = model.predict(X_test)
>>> model.evaluate(X_test, y_test)
build_model(X, y, val_size=0.2, X_val=None, y_val=None, embeddings=None, embeddings_val=None, random_state=101, batch_size=128, shuffle=True, lr=None, lr_patience=None, lr_factor=None, weight_decay=None, train_metrics=None, val_metrics=None, dataloader_kwargs={})

Builds the model using the provided training data.

Parameters:
  • X (DataFrame or array-like, shape (n_samples, n_features)) – The training input samples.

  • y (array-like, shape (n_samples,) or (n_samples, n_targets)) – The target values (real numbers).

  • val_size (float) – The proportion of the dataset to include in the validation split if X_val is None. Ignored if X_val is provided.

  • X_val (DataFrame or array-like, shape (n_samples, n_features), optional) – The validation input samples. If provided, X and y are not split and this data is used for validation.

  • y_val (array-like, shape (n_samples,) or (n_samples, n_targets), optional) – The validation target values. Required if X_val is provided.

  • random_state (int) – Controls the shuffling applied to the data before applying the split.

  • batch_size (int) – Number of samples per gradient update.

  • shuffle (bool) – Whether to shuffle the training data before each epoch.

  • lr (float | None) – Learning rate for the optimizer.

  • lr_patience (int | None) – Number of epochs with no improvement on the validation loss to wait before reducing the learning rate.

  • factor (float, default=0.1) – Factor by which the learning rate will be reduced.

  • weight_decay (float | None) – Weight decay (L2 penalty) coefficient.

  • train_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during training.

  • val_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during validation.

  • dataloader_kwargs (dict, default={}) – The kwargs for the pytorch dataloader class.

Returns:

self – The built regressor.

Return type:

object

encode(X, embeddings=None, batch_size=64)

Encodes input data using the trained model’s embedding layer.

Parameters:
  • X (array-like or DataFrame) – Input data to be encoded.

  • batch_size (int, optional, default=64) – Batch size for encoding.

Returns:

Encoded representations of the input data.

Return type:

torch.Tensor

Raises:

ValueError – If the model or data module is not fitted.

evaluate(X, y_true, embeddings=None, metrics=None)

Evaluate the model on the given data using specified metrics.

Parameters:
  • X (array-like or pd.DataFrame of shape (n_samples, n_features)) – The input samples to predict.

  • y_true (array-like of shape (n_samples,) or (n_samples, n_outputs)) – The true target values against which to evaluate the predictions.

  • metrics (dict) – A dictionary where keys are metric names and values are the metric functions.

Notes

This method uses the predict method to generate predictions and computes each metric.

Returns:

scores – A dictionary with metric names as keys and their corresponding scores as values.

Return type:

dict

fit(X, y, val_size=0.2, X_val=None, y_val=None, embeddings=None, embeddings_val=None, max_epochs=100, random_state=101, batch_size=128, shuffle=True, patience=15, monitor='val_loss', mode='min', lr=None, lr_patience=None, lr_factor=None, weight_decay=None, checkpoint_path='model_checkpoints', dataloader_kwargs={}, train_metrics=None, val_metrics=None, rebuild=True, **trainer_kwargs)

Trains the regression model using the provided training data. Optionally, a separate validation set can be used.

Parameters:
  • X (DataFrame or array-like, shape (n_samples, n_features)) – The training input samples.

  • y (array-like, shape (n_samples,) or (n_samples, n_targets)) – The target values (real numbers).

  • val_size (float) – The proportion of the dataset to include in the validation split if X_val is None. Ignored if X_val is provided.

  • X_val (DataFrame or array-like, shape (n_samples, n_features), optional) – The validation input samples. If provided, X and y are not split and this data is used for validation.

  • y_val (array-like, shape (n_samples,) or (n_samples, n_targets), optional) – The validation target values. Required if X_val is provided.

  • max_epochs (int) – Maximum number of epochs for training.

  • random_state (int) – Controls the shuffling applied to the data before applying the split.

  • batch_size (int) – Number of samples per gradient update.

  • shuffle (bool) – Whether to shuffle the training data before each epoch.

  • patience (int) – Number of epochs with no improvement on the validation loss to wait before early stopping.

  • monitor (str) – The metric to monitor for early stopping.

  • mode (str) – Whether the monitored metric should be minimized (min) or maximized (max).

  • lr (float | None) – Learning rate for the optimizer.

  • lr_patience (int | None) – Number of epochs with no improvement on the validation loss to wait before reducing the learning rate.

  • factor (float, default=0.1) – Factor by which the learning rate will be reduced.

  • weight_decay (float | None) – Weight decay (L2 penalty) coefficient.

  • checkpoint_path (str, default="model_checkpoints") – Path where the checkpoints are being saved.

  • dataloader_kwargs (dict, default={}) – The kwargs for the pytorch dataloader class.

  • train_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during training.

  • val_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during validation.

  • rebuild (bool, default=True) – Whether to rebuild the model when it already was built.

  • **trainer_kwargs (Additional keyword arguments for PyTorch Lightning's Trainer class.)

Returns:

self – The fitted regressor.

Return type:

object

get_number_of_params(requires_grad=True)

Calculate the number of parameters in the model.

Parameters:

requires_grad (bool, optional) – If True, only count the parameters that require gradients (trainable parameters). If False, count all parameters. Default is True.

Returns:

The total number of parameters in the model.

Return type:

int

Raises:

ValueError – If the model has not been built prior to calling this method.

get_params(deep=True)

Get parameters for this estimator.

classmethod load(path)

Load and return a fitted model from path.

Parameters:

path (str) – Path to a file previously written by save().

Returns:

A fully reconstructed, ready-to-predict estimator of the same type that was saved.

Return type:

estimator

optimize_hparams(X, y, X_val=None, y_val=None, embeddings=None, embeddings_val=None, time=100, max_epochs=200, prune_by_epoch=True, prune_epoch=5, fixed_params={'cat_encoding': 'int', 'head_layer_size_length': 0, 'head_skip_layer': False, 'head_skip_layers': False, 'pooling_method': 'avg', 'use_cls': False}, custom_search_space=None, **optimize_kwargs)

Optimizes hyperparameters using Bayesian optimization with optional pruning.

Parameters:
  • X (array-like) – Training data.

  • y (array-like) – Training labels.

  • X_val (array-like, optional) – Validation data and labels.

  • y_val (array-like, optional) – Validation data and labels.

  • time (int) – The number of optimization trials to run.

  • max_epochs (int) – Maximum number of epochs for training.

  • prune_by_epoch (bool) – Whether to prune based on a specific epoch (True) or the best validation loss (False).

  • prune_epoch (int) – The specific epoch to prune by when prune_by_epoch is True.

  • **optimize_kwargs (dict) – Additional keyword arguments passed to the fit method.

Returns:

best_hparams – Best hyperparameters found during optimization.

Return type:

list

predict(X, embeddings=None, device=None)

Predicts target values for the given input samples.

Parameters:

X (DataFrame or array-like, shape (n_samples, n_features)) – The input samples for which to predict target values.

Returns:

predictions – The predicted target values.

Return type:

ndarray, shape (n_samples,) or (n_samples, n_outputs)

pretrain(pretrain_epochs=15, k_neighbors=10, temperature=0.1, save_path='pretrained_embeddings.pth', lr=0.001, use_positive=True, use_negative=False, pool_sequence=True)

Pretrains the embedding layer of the model using a contrastive learning approach.

This method performs pretraining by optimizing the embeddings with respect to neighborhood structure in the feature space. The embeddings are saved after training.

Parameters:
  • pretrain_epochs (int, default=15) – Number of epochs to run pretraining.

  • k_neighbors (int, default=10) – Number of neighbors used in the contrastive loss computation.

  • temperature (float, default=0.1) – Temperature parameter for contrastive loss scaling.

  • save_path (str, default="pretrained_embeddings.pth") – Path to save the pretrained embeddings.

  • lr (float, default=1e-3) – Learning rate for the pretraining optimizer.

  • use_positive (bool, default=True) – Whether to include positive pairs in contrastive learning.

  • use_negative (bool, default=False) – Whether to include negative pairs in contrastive learning.

  • pool_sequence (bool, default=True) – Whether to apply sequence pooling before computing contrastive loss.

Raises:
  • ValueError – If the model has not been built before calling this method.

  • ValueError – If the model does not contain an embedding layer.

Notes

  • This function requires that self.build_model() has been called beforehand.

  • The pretraining method uses self.task_model.estimator.embedding_layer.

  • The method invokes super()._pretrain() with regression mode enabled.

save(path)

Save the fitted model to path.

The bundle written by this method can be restored with load(). It contains all state required for inference: the config, the fitted preprocessor, feature metadata, and the neural-network weights.

Parameters:

path (str) – Destination file path (e.g. "model.pt").

Raises:

ValueError – If the model has not been fitted yet.

Return type:

None

score(X, y, embeddings=None, metric=sklearn.metrics.mean_squared_error)

Calculate the score of the model using the specified metric.

Parameters:
  • X (array-like or pd.DataFrame of shape (n_samples, n_features)) – The input samples to predict.

  • y (array-like of shape (n_samples,) or (n_samples, n_outputs)) – The true target values against which to evaluate the predictions.

  • metric (callable, default=mean_squared_error) – The metric function to use for evaluation. Must be a callable with the signature metric(y_true, y_pred).

Returns:

score – The score calculated using the specified metric.

Return type:

float

set_params(**parameters)

Set the parameters of this estimator.

class deeptab.models.FTTransformerLSS(*args: Any, **kwargs: Any)[source]
FTTransformer for distributional regression.

This class extends the SklearnBaseLSS class and uses the FTTransformer model with the default FTTransformer configuration.

Notes

The parameters for this class include the attributes from the config dataclass as well as preprocessing arguments handled by the base class.

Configuration class for the FT Transformer model with predefined hyperparameters.

Parameters:
  • d_model (int, default=128) – Dimensionality of the transformer model.

  • n_layers (int, default=4) – Number of transformer layers.

  • n_heads (int, default=8) – Number of attention heads in the transformer.

  • attn_dropout (float, default=0.2) – Dropout rate for the attention mechanism.

  • ff_dropout (float, default=0.1) – Dropout rate for the feed-forward layers.

  • norm (str, default="LayerNorm") – Type of normalization to be used (‘LayerNorm’, ‘RMSNorm’, etc.).

  • activation (callable, default=nn.SELU()) – Activation function for the transformer layers.

  • transformer_activation (callable, default=ReGLU()) – Activation function for the transformer feed-forward layers.

  • transformer_dim_feedforward (int, default=256) – Dimensionality of the feed-forward layers in the transformer.

  • layer_norm_eps (float, default=1e-05) – Epsilon value for layer normalization to improve numerical stability.

  • norm_first (bool, default=False) – Whether to apply normalization before other operations in each transformer block.

  • bias (bool, default=True) – Whether to use bias in linear layers.

  • head_layer_sizes (list, default=()) – Sizes of the fully connected layers in the model’s head.

  • head_dropout (float, default=0.5) – Dropout rate for the head layers.

  • head_skip_layers (bool, default=False) – Whether to use skip connections in the head layers.

  • head_activation (callable, default=nn.SELU()) – Activation function for the head layers.

  • head_use_batch_norm (bool, default=False) – Whether to use batch normalization in the head layers.

  • pooling_method (str, default="avg") – Pooling method to be used (‘cls’, ‘avg’, etc.).

  • use_cls (bool, default=False) – Whether to use a CLS token for pooling.

  • cat_encoding (str, default="int") – Method for encoding categorical features (‘int’, ‘one-hot’, or ‘linear’).

  • feature_preprocessing (dict, optional) – Dictionary mapping feature names to specific preprocessing methods. Overrides global defaults.

  • n_bins (int, default=64) – Number of bins used for binning-based preprocessing (e.g., for discretizers or PLE).

  • numerical_preprocessing (str, default="ple") – Preprocessing method for numerical features (e.g., “standardization”, “minmax”, “ple”, “rbf”, etc.).

  • categorical_preprocessing (str, default="int") – Preprocessing method for categorical features (e.g., “int”, “ordinal”, “onehot”).

  • use_decision_tree_bins (bool, default=False) – Whether to use decision tree binning for numerical discretization.

  • binning_strategy (str, default="uniform") – Strategy for bin placement when not using tree-based methods. Options: “uniform”, “quantile”.

  • task (str, default="regression") – Problem type used to guide preprocessing (e.g., “regression” or “classification”).

  • cat_cutoff (float or int, default=0.03) – Threshold to determine whether integer-valued features are treated as categorical.

  • treat_all_integers_as_numerical (bool, default=False) – If True, treat all integer-typed columns as numerical regardless of cardinality.

  • degree (int, default=3) – Degree of polynomial or spline basis functions where applicable.

  • scaling_strategy (str, default="minmax") – Strategy for feature scaling (e.g., “standardization”, “minmax”, etc.).

  • n_knots (int, default=64) – Number of knots used in spline-based feature expansions.

  • use_decision_tree_knots (bool, default=True) – Whether to use decision tree-based knot placement for spline transformations.

  • knots_strategy (str, default="uniform") – Strategy for placing knots for splines (“uniform” or “quantile”).

  • spline_implementation (str, default="sklearn") – Which spline backend implementation to use (e.g., “sklearn”, “custom”).

  • min_unique_vals (int, default=5) – Minimum number of unique values required for a feature to be treated as numerical.

Examples

>>> from deeptab.models import FTTransformerLSS
>>> model = FTTransformerLSS(d_model=64, n_layers=8)
>>> model.fit(X_train, y_train, family="normal")
>>> preds = model.predict(X_test)
>>> model.evaluate(X_test, y_test)
build_model(X, y, val_size=0.2, X_val=None, y_val=None, random_state=101, batch_size=128, shuffle=True, lr=None, lr_patience=None, lr_factor=None, weight_decay=None, train_metrics=None, val_metrics=None, dataloader_kwargs={})

Builds the model using the provided training data.

Parameters:
  • X (DataFrame or array-like, shape (n_samples, n_features)) – The training input samples.

  • y (array-like, shape (n_samples,) or (n_samples, n_targets)) – The target values (real numbers).

  • val_size (float) – The proportion of the dataset to include in the validation split if X_val is None. Ignored if X_val is provided.

  • X_val (DataFrame or array-like, shape (n_samples, n_features), optional) – The validation input samples. If provided, X and y are not split and this data is used for validation.

  • y_val (array-like, shape (n_samples,) or (n_samples, n_targets), optional) – The validation target values. Required if X_val is provided.

  • random_state (int) – Controls the shuffling applied to the data before applying the split.

  • batch_size (int) – Number of samples per gradient update.

  • shuffle (bool) – Whether to shuffle the training data before each epoch.

  • lr (float | None) – Learning rate for the optimizer.

  • lr_patience (int | None) – Number of epochs with no improvement on the validation loss to wait before reducing the learning rate.

  • lr_factor (float | None) – Factor by which the learning rate will be reduced.

  • train_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during training.

  • val_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during validation.

  • weight_decay (float | None) – Weight decay (L2 penalty) coefficient.

  • dataloader_kwargs (dict, default={}) – The kwargs for the pytorch dataloader class.

Returns:

self – The built distributional regressor.

Return type:

object

encode(X, batch_size=64)

Encodes input data using the trained model’s embedding layer.

Parameters:
  • X (array-like or DataFrame) – Input data to be encoded.

  • batch_size (int, optional, default=64) – Batch size for encoding.

Returns:

Encoded representations of the input data.

Return type:

torch.Tensor

Raises:

ValueError – If the model or data module is not fitted.

evaluate(X, y_true, metrics=None, distribution_family=None)

Evaluate the model on the given data using specified metrics.

Parameters:
  • X (array-like or pd.DataFrame of shape (n_samples, n_features)) – The input samples to predict.

  • y_true (array-like of shape (n_samples,)) – The true class labels against which to evaluate the predictions.

  • metrics (dict) – A dictionary where keys are metric names and values are tuples containing the metric function and a boolean indicating whether the metric requires probability scores (True) or class labels (False).

  • distribution_family (str, optional) – Specifies the distribution family the model is predicting for. If None, it will attempt to infer based on the model’s settings.

Returns:

scores – A dictionary with metric names as keys and their corresponding scores as values.

Return type:

dict

Notes

This method uses either the predict or predict_proba method depending on the metric requirements.

fit(X, y, family, val_size=0.2, X_val=None, y_val=None, max_epochs=100, random_state=101, batch_size=128, shuffle=True, patience=15, monitor='val_loss', mode='min', lr=None, lr_patience=None, lr_factor=None, weight_decay=None, checkpoint_path='model_checkpoints', distributional_kwargs=None, train_metrics=None, val_metrics=None, dataloader_kwargs={}, rebuild=True, **trainer_kwargs)

Trains the regression model using the provided training data. Optionally, a separate validation set can be used.

Parameters:
  • X (DataFrame or array-like, shape (n_samples, n_features)) – The training input samples.

  • y (array-like, shape (n_samples,) or (n_samples, n_targets)) – The target values (real numbers).

  • family (str) – The name of the distribution family to use for the loss function. Examples include ‘normal’ for regression tasks.

  • val_size (float) – The proportion of the dataset to include in the validation split if X_val is None. Ignored if X_val is provided.

  • X_val (DataFrame or array-like, shape (n_samples, n_features), optional) – The validation input samples. If provided, X and y are not split and this data is used for validation.

  • y_val (array-like, shape (n_samples,) or (n_samples, n_targets), optional) – The validation target values. Required if X_val is provided.

  • max_epochs (int) – Maximum number of epochs for training.

  • random_state (int) – Controls the shuffling applied to the data before applying the split.

  • batch_size (int) – Number of samples per gradient update.

  • shuffle (bool) – Whether to shuffle the training data before each epoch.

  • patience (int) – Number of epochs with no improvement on the validation loss to wait before early stopping.

  • monitor (str) – The metric to monitor for early stopping.

  • mode (str) – Whether the monitored metric should be minimized (min) or maximized (max).

  • lr (float | None) – Learning rate for the optimizer.

  • lr_patience (int | None) – Number of epochs with no improvement on the validation loss to wait before reducing the learning rate.

  • factor (float, default=0.1) – Factor by which the learning rate will be reduced.

  • weight_decay (float | None) – Weight decay (L2 penalty) coefficient.

  • distributional_kwargs (dict, default=None) – any arguments taht are specific for a certain distribution.

  • train_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during training.

  • val_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during validation.

  • checkpoint_path (str, default="model_checkpoints") – Path where the checkpoints are being saved.

  • dataloader_kwargs (dict, default={}) – The kwargs for the pytorch dataloader class.

  • **trainer_kwargs (Additional keyword arguments for PyTorch Lightning's Trainer class.)

Returns:

self – The fitted regressor.

Return type:

object

get_default_metrics(distribution_family)

Provides default metrics based on the distribution family.

Parameters:

distribution_family (str) – The distribution family for which to provide default metrics.

Returns:

metrics – A dictionary of default metric functions.

Return type:

dict

get_number_of_params(requires_grad=True)

Calculate the number of parameters in the model.

Parameters:

requires_grad (bool, optional) – If True, only count the parameters that require gradients (trainable parameters). If False, count all parameters. Default is True.

Returns:

The total number of parameters in the model.

Return type:

int

Raises:

ValueError – If the model has not been built prior to calling this method.

get_params(deep=True)

Get parameters for this estimator.

Parameters:

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

params – Parameter names mapped to their values.

Return type:

dict

classmethod load(path)

Load and return a fitted model from path.

Parameters:

path (str) – Path to a file previously written by save().

Returns:

A fully reconstructed, ready-to-predict estimator.

Return type:

estimator

optimize_hparams(X, y, X_val=None, y_val=None, time=100, max_epochs=200, prune_by_epoch=True, prune_epoch=5, fixed_params={'cat_encoding': 'int', 'head_layer_size_length': 0, 'head_skip_layer': False, 'head_skip_layers': False, 'pooling_method': 'avg', 'use_cls': False}, custom_search_space=None, **optimize_kwargs)

Optimizes hyperparameters using Bayesian optimization with optional pruning.

Parameters:
  • X (array-like) – Training data.

  • y (array-like) – Training labels.

  • X_val (array-like, optional) – Validation data and labels.

  • y_val (array-like, optional) – Validation data and labels.

  • time (int) – The number of optimization trials to run.

  • max_epochs (int) – Maximum number of epochs for training.

  • prune_by_epoch (bool) – Whether to prune based on a specific epoch (True) or the best validation loss (False).

  • prune_epoch (int) – The specific epoch to prune by when prune_by_epoch is True.

  • **optimize_kwargs (dict) – Additional keyword arguments passed to the fit method.

Returns:

best_hparams – Best hyperparameters found during optimization.

Return type:

list

predict(X, raw=False, device=None)

Predicts target values for the given input samples.

Parameters:

X (DataFrame or array-like, shape (n_samples, n_features)) – The input samples for which to predict target values.

Returns:

predictions – The predicted target values.

Return type:

ndarray, shape (n_samples,) or (n_samples, n_outputs)

save(path)

Save the fitted model to path.

Parameters:

path (str) – Destination file path (e.g. "model.pt").

Raises:

ValueError – If the model has not been fitted yet.

Return type:

None

score(X, y, metric='NLL')

Calculate the score of the model using the specified metric.

Parameters:
  • X (array-like or pd.DataFrame of shape (n_samples, n_features)) – The input samples to predict.

  • y (array-like of shape (n_samples,) or (n_samples, n_outputs)) – The true target values against which to evaluate the predictions.

  • metric (str, default="NLL") – So far, only negative log-likelihood is supported

Returns:

score – The score calculated using the specified metric.

Return type:

float

set_params(**parameters)

Set the parameters of this estimator.

Parameters:

**parameters (dict) – Estimator parameters.

Returns:

self – Estimator instance.

Return type:

object

class deeptab.models.MLPClassifier(*args: Any, **kwargs: Any)[source]

Multi-Layer Perceptron classifier This class extends the SklearnBaseClassifier class and uses the MLP model with the default MLP configuration.

Notes

The parameters for this class include the attributes from the config dataclass as well as preprocessing arguments handled by the base class.

Configuration class for the default Multi-Layer Perceptron (MLP) model with predefined hyperparameters.

Parameters:
  • layer_sizes (list, default=(256, 128, 32)) – Sizes of the layers in the MLP.

  • activation (callable, default=nn.ReLU()) – Activation function for the MLP layers.

  • skip_layers (bool, default=False) – Whether to skip layers in the MLP.

  • dropout (float, default=0.2) – Dropout rate for regularization.

  • use_glu (bool, default=False) – Whether to use Gated Linear Units (GLU) in the MLP.

  • skip_connections (bool, default=False) – Whether to use skip connections in the MLP.

  • feature_preprocessing (dict, optional) – Dictionary mapping feature names to specific preprocessing methods. Overrides global defaults.

  • n_bins (int, default=64) – Number of bins used for binning-based preprocessing (e.g., for discretizers or PLE).

  • numerical_preprocessing (str, default="ple") – Preprocessing method for numerical features (e.g., “standardization”, “minmax”, “ple”, “rbf”, etc.).

  • categorical_preprocessing (str, default="int") – Preprocessing method for categorical features (e.g., “int”, “ordinal”, “onehot”).

  • use_decision_tree_bins (bool, default=False) – Whether to use decision tree binning for numerical discretization.

  • binning_strategy (str, default="uniform") – Strategy for bin placement when not using tree-based methods. Options: “uniform”, “quantile”.

  • task (str, default="regression") – Problem type used to guide preprocessing (e.g., “regression” or “classification”).

  • cat_cutoff (float or int, default=0.03) – Threshold to determine whether integer-valued features are treated as categorical.

  • treat_all_integers_as_numerical (bool, default=False) – If True, treat all integer-typed columns as numerical regardless of cardinality.

  • degree (int, default=3) – Degree of polynomial or spline basis functions where applicable.

  • scaling_strategy (str, default="minmax") – Strategy for feature scaling (e.g., “standardization”, “minmax”, etc.).

  • n_knots (int, default=64) – Number of knots used in spline-based feature expansions.

  • use_decision_tree_knots (bool, default=True) – Whether to use decision tree-based knot placement for spline transformations.

  • knots_strategy (str, default="uniform") – Strategy for placing knots for splines (“uniform” or “quantile”).

  • spline_implementation (str, default="sklearn") – Which spline backend implementation to use (e.g., “sklearn”, “custom”).

  • min_unique_vals (int, default=5) – Minimum number of unique values required for a feature to be treated as numerical.

Examples

>>> from deeptab.models import MLPClassifier
>>> model = MLPClassifier(d_model=64, n_layers=8)
>>> model.fit(X_train, y_train)
>>> preds = model.predict(X_test)
>>> model.evaluate(X_test, y_test)
build_model(X, y, val_size=0.2, X_val=None, y_val=None, embeddings=None, embeddings_val=None, random_state=101, batch_size=128, shuffle=True, lr=None, lr_patience=None, lr_factor=None, weight_decay=None, train_metrics=None, val_metrics=None, dataloader_kwargs={})

Builds the model using the provided training data.

Parameters:
  • X (DataFrame or array-like, shape (n_samples, n_features)) – The training input samples.

  • y (array-like, shape (n_samples,) or (n_samples, n_targets)) – The target values (real numbers).

  • val_size (float) – The proportion of the dataset to include in the validation split if X_val is None. Ignored if X_val is provided.

  • X_val (DataFrame or array-like, shape (n_samples, n_features), optional) – The validation input samples. If provided, X and y are not split and this data is used for validation.

  • y_val (array-like, shape (n_samples,) or (n_samples, n_targets), optional) – The validation target values. Required if X_val is provided.

  • random_state (int) – Controls the shuffling applied to the data before applying the split.

  • batch_size (int) – Number of samples per gradient update.

  • shuffle (bool) – Whether to shuffle the training data before each epoch.

  • lr (float | None) – Learning rate for the optimizer.

  • lr_patience (int | None) – Number of epochs with no improvement on the validation loss to wait before reducing the learning rate.

  • lr_factor (float | None) – Factor by which the learning rate will be reduced.

  • train_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during training.

  • val_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during validation.

  • weight_decay (float | None) – Weight decay (L2 penalty) coefficient.

  • dataloader_kwargs (dict, default={}) – The kwargs for the pytorch dataloader class.

Returns:

self – The built classifier.

Return type:

object

encode(X, embeddings=None, batch_size=64)

Encodes input data using the trained model’s embedding layer.

Parameters:
  • X (array-like or DataFrame) – Input data to be encoded.

  • batch_size (int, optional, default=64) – Batch size for encoding.

Returns:

Encoded representations of the input data.

Return type:

torch.Tensor

Raises:

ValueError – If the model or data module is not fitted.

evaluate(X, y_true, embeddings=None, metrics=None)

Evaluate the model on the given data using specified metrics.

Parameters:
  • X (array-like or pd.DataFrame of shape (n_samples, n_features)) – The input samples to predict.

  • y_true (array-like of shape (n_samples,)) – The true class labels against which to evaluate the predictions.

  • embneddings (array-like or list of shape(n_samples, dimension)) – List or array with embeddings for unstructured data inputs

  • metrics (dict) – A dictionary where keys are metric names and values are tuples containing the metric function and a boolean indicating whether the metric requires probability scores (True) or class labels (False).

Returns:

scores – A dictionary with metric names as keys and their corresponding scores as values.

Return type:

dict

Notes

This method uses either the predict or predict_proba method depending on the metric requirements.

fit(X, y, val_size=0.2, X_val=None, y_val=None, embeddings=None, embeddings_val=None, max_epochs=100, random_state=101, batch_size=128, shuffle=True, patience=15, monitor='val_loss', mode='min', lr=None, lr_patience=None, lr_factor=None, weight_decay=None, checkpoint_path='model_checkpoints', train_metrics=None, val_metrics=None, dataloader_kwargs={}, rebuild=True, **trainer_kwargs)

Trains the classification model using the provided training data. Optionally, a separate validation set can be used.

Parameters:
  • X (DataFrame or array-like, shape (n_samples, n_features)) – The training input samples.

  • y (array-like, shape (n_samples,) or (n_samples, n_targets)) – The target values (real numbers).

  • val_size (float) – The proportion of the dataset to include in the validation split if X_val is None. Ignored if X_val is provided.

  • X_val (DataFrame or array-like, shape (n_samples, n_features), optional) – The validation input samples. If provided, X and y are not split and this data is used for validation.

  • y_val (array-like, shape (n_samples,) or (n_samples, n_targets), optional) – The validation target values. Required if X_val is provided.

  • max_epochs (int) – Maximum number of epochs for training.

  • random_state (int) – Controls the shuffling applied to the data before applying the split.

  • batch_size (int) – Number of samples per gradient update.

  • shuffle (bool) – Whether to shuffle the training data before each epoch.

  • patience (int) – Number of epochs with no improvement on the validation loss to wait before early stopping.

  • monitor (str) – The metric to monitor for early stopping.

  • mode (str) – Whether the monitored metric should be minimized (min) or maximized (max).

  • lr (float | None) – Learning rate for the optimizer.

  • lr_patience (int | None) – Number of epochs with no improvement on the validation loss to wait before reducing the learning rate.

  • factor (float, default=0.1) – Factor by which the learning rate will be reduced.

  • weight_decay (float | None) – Weight decay (L2 penalty) coefficient.

  • checkpoint_path (str, default="model_checkpoints") – Path where the checkpoints are being saved.

  • train_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during training.

  • val_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during validation.

  • dataloader_kwargs (dict, default={}) – The kwargs for the pytorch dataloader class.

  • rebuild (bool, default=True) – Whether to rebuild the model when it already was built.

  • **trainer_kwargs (Additional keyword arguments for PyTorch Lightning's Trainer class.)

Returns:

self – The fitted classifier.

Return type:

object

get_number_of_params(requires_grad=True)

Calculate the number of parameters in the model.

Parameters:

requires_grad (bool, optional) – If True, only count the parameters that require gradients (trainable parameters). If False, count all parameters. Default is True.

Returns:

The total number of parameters in the model.

Return type:

int

Raises:

ValueError – If the model has not been built prior to calling this method.

get_params(deep=True)

Get parameters for this estimator.

classmethod load(path)

Load and return a fitted model from path.

Parameters:

path (str) – Path to a file previously written by save().

Returns:

A fully reconstructed, ready-to-predict estimator of the same type that was saved.

Return type:

estimator

optimize_hparams(X, y, X_val=None, y_val=None, embeddings=None, embeddings_val=None, time=100, max_epochs=200, prune_by_epoch=True, prune_epoch=5, fixed_params={'cat_encoding': 'int', 'head_layer_size_length': 0, 'head_skip_layer': False, 'head_skip_layers': False, 'pooling_method': 'avg', 'use_cls': False}, custom_search_space=None, **optimize_kwargs)

Optimizes hyperparameters using Bayesian optimization with optional pruning.

Parameters:
  • X (array-like) – Training data.

  • y (array-like) – Training labels.

  • X_val (array-like, optional) – Validation data and labels.

  • y_val (array-like, optional) – Validation data and labels.

  • time (int) – The number of optimization trials to run.

  • max_epochs (int) – Maximum number of epochs for training.

  • prune_by_epoch (bool) – Whether to prune based on a specific epoch (True) or the best validation loss (False).

  • prune_epoch (int) – The specific epoch to prune by when prune_by_epoch is True.

  • **optimize_kwargs (dict) – Additional keyword arguments passed to the fit method.

Returns:

best_hparams – Best hyperparameters found during optimization.

Return type:

list

predict(X, embeddings=None, device=None)

Predicts target labels for the given input samples.

Parameters:

X (DataFrame or array-like, shape (n_samples, n_features)) – The input samples for which to predict target values.

Returns:

predictions – The predicted class labels.

Return type:

ndarray, shape (n_samples,)

predict_proba(X, embeddings=None, device=None)

Predicts class probabilities for the given input samples.

Parameters:

X (DataFrame or array-like, shape (n_samples, n_features)) – The input samples for which to predict class probabilities.

Returns:

probabilities – The predicted class probabilities.

Return type:

ndarray, shape (n_samples, n_classes)

pretrain(pretrain_epochs=15, k_neighbors=10, temperature=0.1, save_path='pretrained_embeddings.pth', lr=0.001, use_positive=True, use_negative=False, pool_sequence=True)

Pretrains the embedding layer of the model using a contrastive learning approach.

This method performs pretraining by optimizing the embeddings with respect to neighborhood structure in the feature space. The embeddings are saved after training.

Parameters:
  • pretrain_epochs (int, default=15) – Number of epochs to run pretraining.

  • k_neighbors (int, default=10) – Number of neighbors used in the contrastive loss computation.

  • temperature (float, default=0.1) – Temperature parameter for contrastive loss scaling.

  • save_path (str, default="pretrained_embeddings.pth") – Path to save the pretrained embeddings.

  • lr (float, default=1e-3) – Learning rate for the pretraining optimizer.

  • use_positive (bool, default=True) – Whether to include positive pairs in contrastive learning.

  • use_negative (bool, default=False) – Whether to include negative pairs in contrastive learning.

  • pool_sequence (bool, default=True) – Whether to apply sequence pooling before computing contrastive loss.

Raises:
  • ValueError – If the model has not been built before calling this method.

  • ValueError – If the model does not contain an embedding layer.

Notes

  • This function requires that self.build_model() has been called beforehand.

  • The pretraining method uses self.task_model.estimator.embedding_layer.

  • The method invokes super()._pretrain() with regression mode enabled.

save(path)

Save the fitted model to path.

The bundle written by this method can be restored with load(). It contains all state required for inference: the config, the fitted preprocessor, feature metadata, and the neural-network weights.

Parameters:

path (str) – Destination file path (e.g. "model.pt").

Raises:

ValueError – If the model has not been fitted yet.

Return type:

None

score(X, y, embeddings=None, metric=(sklearn.metrics.log_loss, True))

Calculate the score of the model using the specified metric.

Parameters:
  • X (array-like or pd.DataFrame of shape (n_samples, n_features)) – The input samples to predict.

  • y (array-like of shape (n_samples,)) – The true class labels against which to evaluate the predictions.

  • metric (tuple, default=(log_loss, True)) – A tuple containing the metric function and a boolean indicating whether the metric requires probability scores (True) or class labels (False).

Returns:

score – The score calculated using the specified metric.

Return type:

float

set_params(**parameters)

Set the parameters of this estimator.

class deeptab.models.MLPRegressor(*args: Any, **kwargs: Any)[source]

Multi-Layer Perceptron regressor. This class extends the SklearnBaseRegressor class and uses the MLP model with the default MLP configuration.

Notes

The parameters for this class include the attributes from the config dataclass as well as preprocessing arguments handled by the base class.

Configuration class for the default Multi-Layer Perceptron (MLP) model with predefined hyperparameters.

Parameters:
  • layer_sizes (list, default=(256, 128, 32)) – Sizes of the layers in the MLP.

  • activation (callable, default=nn.ReLU()) – Activation function for the MLP layers.

  • skip_layers (bool, default=False) – Whether to skip layers in the MLP.

  • dropout (float, default=0.2) – Dropout rate for regularization.

  • use_glu (bool, default=False) – Whether to use Gated Linear Units (GLU) in the MLP.

  • skip_connections (bool, default=False) – Whether to use skip connections in the MLP.

  • feature_preprocessing (dict, optional) – Dictionary mapping feature names to specific preprocessing methods. Overrides global defaults.

  • n_bins (int, default=64) – Number of bins used for binning-based preprocessing (e.g., for discretizers or PLE).

  • numerical_preprocessing (str, default="ple") – Preprocessing method for numerical features (e.g., “standardization”, “minmax”, “ple”, “rbf”, etc.).

  • categorical_preprocessing (str, default="int") – Preprocessing method for categorical features (e.g., “int”, “ordinal”, “onehot”).

  • use_decision_tree_bins (bool, default=False) – Whether to use decision tree binning for numerical discretization.

  • binning_strategy (str, default="uniform") – Strategy for bin placement when not using tree-based methods. Options: “uniform”, “quantile”.

  • task (str, default="regression") – Problem type used to guide preprocessing (e.g., “regression” or “classification”).

  • cat_cutoff (float or int, default=0.03) – Threshold to determine whether integer-valued features are treated as categorical.

  • treat_all_integers_as_numerical (bool, default=False) – If True, treat all integer-typed columns as numerical regardless of cardinality.

  • degree (int, default=3) – Degree of polynomial or spline basis functions where applicable.

  • scaling_strategy (str, default="minmax") – Strategy for feature scaling (e.g., “standardization”, “minmax”, etc.).

  • n_knots (int, default=64) – Number of knots used in spline-based feature expansions.

  • use_decision_tree_knots (bool, default=True) – Whether to use decision tree-based knot placement for spline transformations.

  • knots_strategy (str, default="uniform") – Strategy for placing knots for splines (“uniform” or “quantile”).

  • spline_implementation (str, default="sklearn") – Which spline backend implementation to use (e.g., “sklearn”, “custom”).

  • min_unique_vals (int, default=5) – Minimum number of unique values required for a feature to be treated as numerical.

Examples

>>> from deeptab.models import MLPRegressor
>>> model = MLPRegressor(d_model=64, n_layers=8)
>>> model.fit(X_train, y_train)
>>> preds = model.predict(X_test)
>>> model.evaluate(X_test, y_test)
build_model(X, y, val_size=0.2, X_val=None, y_val=None, embeddings=None, embeddings_val=None, random_state=101, batch_size=128, shuffle=True, lr=None, lr_patience=None, lr_factor=None, weight_decay=None, train_metrics=None, val_metrics=None, dataloader_kwargs={})

Builds the model using the provided training data.

Parameters:
  • X (DataFrame or array-like, shape (n_samples, n_features)) – The training input samples.

  • y (array-like, shape (n_samples,) or (n_samples, n_targets)) – The target values (real numbers).

  • val_size (float) – The proportion of the dataset to include in the validation split if X_val is None. Ignored if X_val is provided.

  • X_val (DataFrame or array-like, shape (n_samples, n_features), optional) – The validation input samples. If provided, X and y are not split and this data is used for validation.

  • y_val (array-like, shape (n_samples,) or (n_samples, n_targets), optional) – The validation target values. Required if X_val is provided.

  • random_state (int) – Controls the shuffling applied to the data before applying the split.

  • batch_size (int) – Number of samples per gradient update.

  • shuffle (bool) – Whether to shuffle the training data before each epoch.

  • lr (float | None) – Learning rate for the optimizer.

  • lr_patience (int | None) – Number of epochs with no improvement on the validation loss to wait before reducing the learning rate.

  • factor (float, default=0.1) – Factor by which the learning rate will be reduced.

  • weight_decay (float | None) – Weight decay (L2 penalty) coefficient.

  • train_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during training.

  • val_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during validation.

  • dataloader_kwargs (dict, default={}) – The kwargs for the pytorch dataloader class.

Returns:

self – The built regressor.

Return type:

object

encode(X, embeddings=None, batch_size=64)

Encodes input data using the trained model’s embedding layer.

Parameters:
  • X (array-like or DataFrame) – Input data to be encoded.

  • batch_size (int, optional, default=64) – Batch size for encoding.

Returns:

Encoded representations of the input data.

Return type:

torch.Tensor

Raises:

ValueError – If the model or data module is not fitted.

evaluate(X, y_true, embeddings=None, metrics=None)

Evaluate the model on the given data using specified metrics.

Parameters:
  • X (array-like or pd.DataFrame of shape (n_samples, n_features)) – The input samples to predict.

  • y_true (array-like of shape (n_samples,) or (n_samples, n_outputs)) – The true target values against which to evaluate the predictions.

  • metrics (dict) – A dictionary where keys are metric names and values are the metric functions.

Notes

This method uses the predict method to generate predictions and computes each metric.

Returns:

scores – A dictionary with metric names as keys and their corresponding scores as values.

Return type:

dict

fit(X, y, val_size=0.2, X_val=None, y_val=None, embeddings=None, embeddings_val=None, max_epochs=100, random_state=101, batch_size=128, shuffle=True, patience=15, monitor='val_loss', mode='min', lr=None, lr_patience=None, lr_factor=None, weight_decay=None, checkpoint_path='model_checkpoints', dataloader_kwargs={}, train_metrics=None, val_metrics=None, rebuild=True, **trainer_kwargs)

Trains the regression model using the provided training data. Optionally, a separate validation set can be used.

Parameters:
  • X (DataFrame or array-like, shape (n_samples, n_features)) – The training input samples.

  • y (array-like, shape (n_samples,) or (n_samples, n_targets)) – The target values (real numbers).

  • val_size (float) – The proportion of the dataset to include in the validation split if X_val is None. Ignored if X_val is provided.

  • X_val (DataFrame or array-like, shape (n_samples, n_features), optional) – The validation input samples. If provided, X and y are not split and this data is used for validation.

  • y_val (array-like, shape (n_samples,) or (n_samples, n_targets), optional) – The validation target values. Required if X_val is provided.

  • max_epochs (int) – Maximum number of epochs for training.

  • random_state (int) – Controls the shuffling applied to the data before applying the split.

  • batch_size (int) – Number of samples per gradient update.

  • shuffle (bool) – Whether to shuffle the training data before each epoch.

  • patience (int) – Number of epochs with no improvement on the validation loss to wait before early stopping.

  • monitor (str) – The metric to monitor for early stopping.

  • mode (str) – Whether the monitored metric should be minimized (min) or maximized (max).

  • lr (float | None) – Learning rate for the optimizer.

  • lr_patience (int | None) – Number of epochs with no improvement on the validation loss to wait before reducing the learning rate.

  • factor (float, default=0.1) – Factor by which the learning rate will be reduced.

  • weight_decay (float | None) – Weight decay (L2 penalty) coefficient.

  • checkpoint_path (str, default="model_checkpoints") – Path where the checkpoints are being saved.

  • dataloader_kwargs (dict, default={}) – The kwargs for the pytorch dataloader class.

  • train_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during training.

  • val_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during validation.

  • rebuild (bool, default=True) – Whether to rebuild the model when it already was built.

  • **trainer_kwargs (Additional keyword arguments for PyTorch Lightning's Trainer class.)

Returns:

self – The fitted regressor.

Return type:

object

get_number_of_params(requires_grad=True)

Calculate the number of parameters in the model.

Parameters:

requires_grad (bool, optional) – If True, only count the parameters that require gradients (trainable parameters). If False, count all parameters. Default is True.

Returns:

The total number of parameters in the model.

Return type:

int

Raises:

ValueError – If the model has not been built prior to calling this method.

get_params(deep=True)

Get parameters for this estimator.

classmethod load(path)

Load and return a fitted model from path.

Parameters:

path (str) – Path to a file previously written by save().

Returns:

A fully reconstructed, ready-to-predict estimator of the same type that was saved.

Return type:

estimator

optimize_hparams(X, y, X_val=None, y_val=None, embeddings=None, embeddings_val=None, time=100, max_epochs=200, prune_by_epoch=True, prune_epoch=5, fixed_params={'cat_encoding': 'int', 'head_layer_size_length': 0, 'head_skip_layer': False, 'head_skip_layers': False, 'pooling_method': 'avg', 'use_cls': False}, custom_search_space=None, **optimize_kwargs)

Optimizes hyperparameters using Bayesian optimization with optional pruning.

Parameters:
  • X (array-like) – Training data.

  • y (array-like) – Training labels.

  • X_val (array-like, optional) – Validation data and labels.

  • y_val (array-like, optional) – Validation data and labels.

  • time (int) – The number of optimization trials to run.

  • max_epochs (int) – Maximum number of epochs for training.

  • prune_by_epoch (bool) – Whether to prune based on a specific epoch (True) or the best validation loss (False).

  • prune_epoch (int) – The specific epoch to prune by when prune_by_epoch is True.

  • **optimize_kwargs (dict) – Additional keyword arguments passed to the fit method.

Returns:

best_hparams – Best hyperparameters found during optimization.

Return type:

list

predict(X, embeddings=None, device=None)

Predicts target values for the given input samples.

Parameters:

X (DataFrame or array-like, shape (n_samples, n_features)) – The input samples for which to predict target values.

Returns:

predictions – The predicted target values.

Return type:

ndarray, shape (n_samples,) or (n_samples, n_outputs)

pretrain(pretrain_epochs=15, k_neighbors=10, temperature=0.1, save_path='pretrained_embeddings.pth', lr=0.001, use_positive=True, use_negative=False, pool_sequence=True)

Pretrains the embedding layer of the model using a contrastive learning approach.

This method performs pretraining by optimizing the embeddings with respect to neighborhood structure in the feature space. The embeddings are saved after training.

Parameters:
  • pretrain_epochs (int, default=15) – Number of epochs to run pretraining.

  • k_neighbors (int, default=10) – Number of neighbors used in the contrastive loss computation.

  • temperature (float, default=0.1) – Temperature parameter for contrastive loss scaling.

  • save_path (str, default="pretrained_embeddings.pth") – Path to save the pretrained embeddings.

  • lr (float, default=1e-3) – Learning rate for the pretraining optimizer.

  • use_positive (bool, default=True) – Whether to include positive pairs in contrastive learning.

  • use_negative (bool, default=False) – Whether to include negative pairs in contrastive learning.

  • pool_sequence (bool, default=True) – Whether to apply sequence pooling before computing contrastive loss.

Raises:
  • ValueError – If the model has not been built before calling this method.

  • ValueError – If the model does not contain an embedding layer.

Notes

  • This function requires that self.build_model() has been called beforehand.

  • The pretraining method uses self.task_model.estimator.embedding_layer.

  • The method invokes super()._pretrain() with regression mode enabled.

save(path)

Save the fitted model to path.

The bundle written by this method can be restored with load(). It contains all state required for inference: the config, the fitted preprocessor, feature metadata, and the neural-network weights.

Parameters:

path (str) – Destination file path (e.g. "model.pt").

Raises:

ValueError – If the model has not been fitted yet.

Return type:

None

score(X, y, embeddings=None, metric=sklearn.metrics.mean_squared_error)

Calculate the score of the model using the specified metric.

Parameters:
  • X (array-like or pd.DataFrame of shape (n_samples, n_features)) – The input samples to predict.

  • y (array-like of shape (n_samples,) or (n_samples, n_outputs)) – The true target values against which to evaluate the predictions.

  • metric (callable, default=mean_squared_error) – The metric function to use for evaluation. Must be a callable with the signature metric(y_true, y_pred).

Returns:

score – The score calculated using the specified metric.

Return type:

float

set_params(**parameters)

Set the parameters of this estimator.

class deeptab.models.MLPLSS(*args: Any, **kwargs: Any)[source]

Multi-Layer Perceptron for distributional regression. This class extends the SklearnBaseLSS class and uses the MLP model with the default MLP configuration.

Notes

The parameters for this class include the attributes from the config dataclass as well as preprocessing arguments handled by the base class.

Configuration class for the default Multi-Layer Perceptron (MLP) model with predefined hyperparameters.

Parameters:
  • layer_sizes (list, default=(256, 128, 32)) – Sizes of the layers in the MLP.

  • activation (callable, default=nn.ReLU()) – Activation function for the MLP layers.

  • skip_layers (bool, default=False) – Whether to skip layers in the MLP.

  • dropout (float, default=0.2) – Dropout rate for regularization.

  • use_glu (bool, default=False) – Whether to use Gated Linear Units (GLU) in the MLP.

  • skip_connections (bool, default=False) – Whether to use skip connections in the MLP.

  • feature_preprocessing (dict, optional) – Dictionary mapping feature names to specific preprocessing methods. Overrides global defaults.

  • n_bins (int, default=64) – Number of bins used for binning-based preprocessing (e.g., for discretizers or PLE).

  • numerical_preprocessing (str, default="ple") – Preprocessing method for numerical features (e.g., “standardization”, “minmax”, “ple”, “rbf”, etc.).

  • categorical_preprocessing (str, default="int") – Preprocessing method for categorical features (e.g., “int”, “ordinal”, “onehot”).

  • use_decision_tree_bins (bool, default=False) – Whether to use decision tree binning for numerical discretization.

  • binning_strategy (str, default="uniform") – Strategy for bin placement when not using tree-based methods. Options: “uniform”, “quantile”.

  • task (str, default="regression") – Problem type used to guide preprocessing (e.g., “regression” or “classification”).

  • cat_cutoff (float or int, default=0.03) – Threshold to determine whether integer-valued features are treated as categorical.

  • treat_all_integers_as_numerical (bool, default=False) – If True, treat all integer-typed columns as numerical regardless of cardinality.

  • degree (int, default=3) – Degree of polynomial or spline basis functions where applicable.

  • scaling_strategy (str, default="minmax") – Strategy for feature scaling (e.g., “standardization”, “minmax”, etc.).

  • n_knots (int, default=64) – Number of knots used in spline-based feature expansions.

  • use_decision_tree_knots (bool, default=True) – Whether to use decision tree-based knot placement for spline transformations.

  • knots_strategy (str, default="uniform") – Strategy for placing knots for splines (“uniform” or “quantile”).

  • spline_implementation (str, default="sklearn") – Which spline backend implementation to use (e.g., “sklearn”, “custom”).

  • min_unique_vals (int, default=5) – Minimum number of unique values required for a feature to be treated as numerical.

Examples

>>> from deeptab.models import MLPLSS
>>> model = MLPLSS(d_model=64, n_layers=8)
>>> model.fit(X_train, y_train, family='normal')
>>> preds = model.predict(X_test)
>>> model.evaluate(X_test, y_test)
build_model(X, y, val_size=0.2, X_val=None, y_val=None, random_state=101, batch_size=128, shuffle=True, lr=None, lr_patience=None, lr_factor=None, weight_decay=None, train_metrics=None, val_metrics=None, dataloader_kwargs={})

Builds the model using the provided training data.

Parameters:
  • X (DataFrame or array-like, shape (n_samples, n_features)) – The training input samples.

  • y (array-like, shape (n_samples,) or (n_samples, n_targets)) – The target values (real numbers).

  • val_size (float) – The proportion of the dataset to include in the validation split if X_val is None. Ignored if X_val is provided.

  • X_val (DataFrame or array-like, shape (n_samples, n_features), optional) – The validation input samples. If provided, X and y are not split and this data is used for validation.

  • y_val (array-like, shape (n_samples,) or (n_samples, n_targets), optional) – The validation target values. Required if X_val is provided.

  • random_state (int) – Controls the shuffling applied to the data before applying the split.

  • batch_size (int) – Number of samples per gradient update.

  • shuffle (bool) – Whether to shuffle the training data before each epoch.

  • lr (float | None) – Learning rate for the optimizer.

  • lr_patience (int | None) – Number of epochs with no improvement on the validation loss to wait before reducing the learning rate.

  • lr_factor (float | None) – Factor by which the learning rate will be reduced.

  • train_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during training.

  • val_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during validation.

  • weight_decay (float | None) – Weight decay (L2 penalty) coefficient.

  • dataloader_kwargs (dict, default={}) – The kwargs for the pytorch dataloader class.

Returns:

self – The built distributional regressor.

Return type:

object

encode(X, batch_size=64)

Encodes input data using the trained model’s embedding layer.

Parameters:
  • X (array-like or DataFrame) – Input data to be encoded.

  • batch_size (int, optional, default=64) – Batch size for encoding.

Returns:

Encoded representations of the input data.

Return type:

torch.Tensor

Raises:

ValueError – If the model or data module is not fitted.

evaluate(X, y_true, metrics=None, distribution_family=None)

Evaluate the model on the given data using specified metrics.

Parameters:
  • X (array-like or pd.DataFrame of shape (n_samples, n_features)) – The input samples to predict.

  • y_true (array-like of shape (n_samples,)) – The true class labels against which to evaluate the predictions.

  • metrics (dict) – A dictionary where keys are metric names and values are tuples containing the metric function and a boolean indicating whether the metric requires probability scores (True) or class labels (False).

  • distribution_family (str, optional) – Specifies the distribution family the model is predicting for. If None, it will attempt to infer based on the model’s settings.

Returns:

scores – A dictionary with metric names as keys and their corresponding scores as values.

Return type:

dict

Notes

This method uses either the predict or predict_proba method depending on the metric requirements.

fit(X, y, family, val_size=0.2, X_val=None, y_val=None, max_epochs=100, random_state=101, batch_size=128, shuffle=True, patience=15, monitor='val_loss', mode='min', lr=None, lr_patience=None, lr_factor=None, weight_decay=None, checkpoint_path='model_checkpoints', distributional_kwargs=None, train_metrics=None, val_metrics=None, dataloader_kwargs={}, rebuild=True, **trainer_kwargs)

Trains the regression model using the provided training data. Optionally, a separate validation set can be used.

Parameters:
  • X (DataFrame or array-like, shape (n_samples, n_features)) – The training input samples.

  • y (array-like, shape (n_samples,) or (n_samples, n_targets)) – The target values (real numbers).

  • family (str) – The name of the distribution family to use for the loss function. Examples include ‘normal’ for regression tasks.

  • val_size (float) – The proportion of the dataset to include in the validation split if X_val is None. Ignored if X_val is provided.

  • X_val (DataFrame or array-like, shape (n_samples, n_features), optional) – The validation input samples. If provided, X and y are not split and this data is used for validation.

  • y_val (array-like, shape (n_samples,) or (n_samples, n_targets), optional) – The validation target values. Required if X_val is provided.

  • max_epochs (int) – Maximum number of epochs for training.

  • random_state (int) – Controls the shuffling applied to the data before applying the split.

  • batch_size (int) – Number of samples per gradient update.

  • shuffle (bool) – Whether to shuffle the training data before each epoch.

  • patience (int) – Number of epochs with no improvement on the validation loss to wait before early stopping.

  • monitor (str) – The metric to monitor for early stopping.

  • mode (str) – Whether the monitored metric should be minimized (min) or maximized (max).

  • lr (float | None) – Learning rate for the optimizer.

  • lr_patience (int | None) – Number of epochs with no improvement on the validation loss to wait before reducing the learning rate.

  • factor (float, default=0.1) – Factor by which the learning rate will be reduced.

  • weight_decay (float | None) – Weight decay (L2 penalty) coefficient.

  • distributional_kwargs (dict, default=None) – any arguments taht are specific for a certain distribution.

  • train_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during training.

  • val_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during validation.

  • checkpoint_path (str, default="model_checkpoints") – Path where the checkpoints are being saved.

  • dataloader_kwargs (dict, default={}) – The kwargs for the pytorch dataloader class.

  • **trainer_kwargs (Additional keyword arguments for PyTorch Lightning's Trainer class.)

Returns:

self – The fitted regressor.

Return type:

object

get_default_metrics(distribution_family)

Provides default metrics based on the distribution family.

Parameters:

distribution_family (str) – The distribution family for which to provide default metrics.

Returns:

metrics – A dictionary of default metric functions.

Return type:

dict

get_number_of_params(requires_grad=True)

Calculate the number of parameters in the model.

Parameters:

requires_grad (bool, optional) – If True, only count the parameters that require gradients (trainable parameters). If False, count all parameters. Default is True.

Returns:

The total number of parameters in the model.

Return type:

int

Raises:

ValueError – If the model has not been built prior to calling this method.

get_params(deep=True)

Get parameters for this estimator.

Parameters:

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

params – Parameter names mapped to their values.

Return type:

dict

classmethod load(path)

Load and return a fitted model from path.

Parameters:

path (str) – Path to a file previously written by save().

Returns:

A fully reconstructed, ready-to-predict estimator.

Return type:

estimator

optimize_hparams(X, y, X_val=None, y_val=None, time=100, max_epochs=200, prune_by_epoch=True, prune_epoch=5, fixed_params={'cat_encoding': 'int', 'head_layer_size_length': 0, 'head_skip_layer': False, 'head_skip_layers': False, 'pooling_method': 'avg', 'use_cls': False}, custom_search_space=None, **optimize_kwargs)

Optimizes hyperparameters using Bayesian optimization with optional pruning.

Parameters:
  • X (array-like) – Training data.

  • y (array-like) – Training labels.

  • X_val (array-like, optional) – Validation data and labels.

  • y_val (array-like, optional) – Validation data and labels.

  • time (int) – The number of optimization trials to run.

  • max_epochs (int) – Maximum number of epochs for training.

  • prune_by_epoch (bool) – Whether to prune based on a specific epoch (True) or the best validation loss (False).

  • prune_epoch (int) – The specific epoch to prune by when prune_by_epoch is True.

  • **optimize_kwargs (dict) – Additional keyword arguments passed to the fit method.

Returns:

best_hparams – Best hyperparameters found during optimization.

Return type:

list

predict(X, raw=False, device=None)

Predicts target values for the given input samples.

Parameters:

X (DataFrame or array-like, shape (n_samples, n_features)) – The input samples for which to predict target values.

Returns:

predictions – The predicted target values.

Return type:

ndarray, shape (n_samples,) or (n_samples, n_outputs)

save(path)

Save the fitted model to path.

Parameters:

path (str) – Destination file path (e.g. "model.pt").

Raises:

ValueError – If the model has not been fitted yet.

Return type:

None

score(X, y, metric='NLL')

Calculate the score of the model using the specified metric.

Parameters:
  • X (array-like or pd.DataFrame of shape (n_samples, n_features)) – The input samples to predict.

  • y (array-like of shape (n_samples,) or (n_samples, n_outputs)) – The true target values against which to evaluate the predictions.

  • metric (str, default="NLL") – So far, only negative log-likelihood is supported

Returns:

score – The score calculated using the specified metric.

Return type:

float

set_params(**parameters)

Set the parameters of this estimator.

Parameters:

**parameters (dict) – Estimator parameters.

Returns:

self – Estimator instance.

Return type:

object

class deeptab.models.TabTransformerClassifier(*args: Any, **kwargs: Any)[source]

TabTransformer classifier. This class extends the SklearnBaseClassifier class and uses the TabTransformer model with the default TabTransformer configuration.

Notes

The parameters for this class include the attributes from the config dataclass as well as preprocessing arguments handled by the base class.

Configuration class for the default Tab Transformer model with predefined hyperparameters.

Parameters:
  • n_layers (int, default=4) – Number of layers in the transformer.

  • n_heads (int, default=8) – Number of attention heads in the transformer.

  • d_model (int, default=128) – Dimensionality of embeddings or model representations.

  • attn_dropout (float, default=0.2) – Dropout rate for the attention mechanism.

  • ff_dropout (float, default=0.1) – Dropout rate for the feed-forward layers.

  • norm (str, default="LayerNorm") – Normalization method to be used.

  • activation (callable, default=nn.SELU()) – Activation function for the transformer layers.

  • transformer_activation (callable, default=ReGLU()) – Activation function for the transformer layers.

  • transformer_dim_feedforward (int, default=512) – Dimensionality of the feed-forward layers in the transformer.

  • norm_first (bool, default=True) – Whether to apply normalization before other operations in each transformer block.

  • bias (bool, default=True) – Whether to use bias in the linear layers.

  • head_layer_sizes (list, default=()) – Sizes of the layers in the model’s head.

  • head_dropout (float, default=0.5) – Dropout rate for the head layers.

  • head_skip_layers (bool, default=False) – Whether to skip layers in the head.

  • head_activation (callable, default=nn.SELU()) – Activation function for the head layers.

  • head_use_batch_norm (bool, default=False) – Whether to use batch normalization in the head layers.

  • pooling_method (str, default="avg") – Pooling method to be used (‘cls’, ‘avg’, etc.).

  • cat_encoding (str, default="int") – Encoding method for categorical features (‘int’, ‘one-hot’, etc.).

  • feature_preprocessing (dict, optional) – Dictionary mapping feature names to specific preprocessing methods. Overrides global defaults.

  • n_bins (int, default=64) – Number of bins used for binning-based preprocessing (e.g., for discretizers or PLE).

  • numerical_preprocessing (str, default="ple") – Preprocessing method for numerical features (e.g., “standardization”, “minmax”, “ple”, “rbf”, etc.).

  • categorical_preprocessing (str, default="int") – Preprocessing method for categorical features (e.g., “int”, “ordinal”, “onehot”).

  • use_decision_tree_bins (bool, default=False) – Whether to use decision tree binning for numerical discretization.

  • binning_strategy (str, default="uniform") – Strategy for bin placement when not using tree-based methods. Options: “uniform”, “quantile”.

  • task (str, default="regression") – Problem type used to guide preprocessing (e.g., “regression” or “classification”).

  • cat_cutoff (float or int, default=0.03) – Threshold to determine whether integer-valued features are treated as categorical.

  • treat_all_integers_as_numerical (bool, default=False) – If True, treat all integer-typed columns as numerical regardless of cardinality.

  • degree (int, default=3) – Degree of polynomial or spline basis functions where applicable.

  • scaling_strategy (str, default="minmax") – Strategy for feature scaling (e.g., “standardization”, “minmax”, etc.).

  • n_knots (int, default=64) – Number of knots used in spline-based feature expansions.

  • use_decision_tree_knots (bool, default=True) – Whether to use decision tree-based knot placement for spline transformations.

  • knots_strategy (str, default="uniform") – Strategy for placing knots for splines (“uniform” or “quantile”).

  • spline_implementation (str, default="sklearn") – Which spline backend implementation to use (e.g., “sklearn”, “custom”).

  • min_unique_vals (int, default=5) – Minimum number of unique values required for a feature to be treated as numerical.

Examples

>>> from deeptab.models import TabTransformerClassifier
>>> model = TabTransformerClassifier()
>>> model.fit(X_train, y_train)
>>> preds = model.predict(X_test)
>>> model.evaluate(X_test, y_test)
build_model(X, y, val_size=0.2, X_val=None, y_val=None, embeddings=None, embeddings_val=None, random_state=101, batch_size=128, shuffle=True, lr=None, lr_patience=None, lr_factor=None, weight_decay=None, train_metrics=None, val_metrics=None, dataloader_kwargs={})

Builds the model using the provided training data.

Parameters:
  • X (DataFrame or array-like, shape (n_samples, n_features)) – The training input samples.

  • y (array-like, shape (n_samples,) or (n_samples, n_targets)) – The target values (real numbers).

  • val_size (float) – The proportion of the dataset to include in the validation split if X_val is None. Ignored if X_val is provided.

  • X_val (DataFrame or array-like, shape (n_samples, n_features), optional) – The validation input samples. If provided, X and y are not split and this data is used for validation.

  • y_val (array-like, shape (n_samples,) or (n_samples, n_targets), optional) – The validation target values. Required if X_val is provided.

  • random_state (int) – Controls the shuffling applied to the data before applying the split.

  • batch_size (int) – Number of samples per gradient update.

  • shuffle (bool) – Whether to shuffle the training data before each epoch.

  • lr (float | None) – Learning rate for the optimizer.

  • lr_patience (int | None) – Number of epochs with no improvement on the validation loss to wait before reducing the learning rate.

  • lr_factor (float | None) – Factor by which the learning rate will be reduced.

  • train_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during training.

  • val_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during validation.

  • weight_decay (float | None) – Weight decay (L2 penalty) coefficient.

  • dataloader_kwargs (dict, default={}) – The kwargs for the pytorch dataloader class.

Returns:

self – The built classifier.

Return type:

object

encode(X, embeddings=None, batch_size=64)

Encodes input data using the trained model’s embedding layer.

Parameters:
  • X (array-like or DataFrame) – Input data to be encoded.

  • batch_size (int, optional, default=64) – Batch size for encoding.

Returns:

Encoded representations of the input data.

Return type:

torch.Tensor

Raises:

ValueError – If the model or data module is not fitted.

evaluate(X, y_true, embeddings=None, metrics=None)

Evaluate the model on the given data using specified metrics.

Parameters:
  • X (array-like or pd.DataFrame of shape (n_samples, n_features)) – The input samples to predict.

  • y_true (array-like of shape (n_samples,)) – The true class labels against which to evaluate the predictions.

  • embneddings (array-like or list of shape(n_samples, dimension)) – List or array with embeddings for unstructured data inputs

  • metrics (dict) – A dictionary where keys are metric names and values are tuples containing the metric function and a boolean indicating whether the metric requires probability scores (True) or class labels (False).

Returns:

scores – A dictionary with metric names as keys and their corresponding scores as values.

Return type:

dict

Notes

This method uses either the predict or predict_proba method depending on the metric requirements.

fit(X, y, val_size=0.2, X_val=None, y_val=None, embeddings=None, embeddings_val=None, max_epochs=100, random_state=101, batch_size=128, shuffle=True, patience=15, monitor='val_loss', mode='min', lr=None, lr_patience=None, lr_factor=None, weight_decay=None, checkpoint_path='model_checkpoints', train_metrics=None, val_metrics=None, dataloader_kwargs={}, rebuild=True, **trainer_kwargs)

Trains the classification model using the provided training data. Optionally, a separate validation set can be used.

Parameters:
  • X (DataFrame or array-like, shape (n_samples, n_features)) – The training input samples.

  • y (array-like, shape (n_samples,) or (n_samples, n_targets)) – The target values (real numbers).

  • val_size (float) – The proportion of the dataset to include in the validation split if X_val is None. Ignored if X_val is provided.

  • X_val (DataFrame or array-like, shape (n_samples, n_features), optional) – The validation input samples. If provided, X and y are not split and this data is used for validation.

  • y_val (array-like, shape (n_samples,) or (n_samples, n_targets), optional) – The validation target values. Required if X_val is provided.

  • max_epochs (int) – Maximum number of epochs for training.

  • random_state (int) – Controls the shuffling applied to the data before applying the split.

  • batch_size (int) – Number of samples per gradient update.

  • shuffle (bool) – Whether to shuffle the training data before each epoch.

  • patience (int) – Number of epochs with no improvement on the validation loss to wait before early stopping.

  • monitor (str) – The metric to monitor for early stopping.

  • mode (str) – Whether the monitored metric should be minimized (min) or maximized (max).

  • lr (float | None) – Learning rate for the optimizer.

  • lr_patience (int | None) – Number of epochs with no improvement on the validation loss to wait before reducing the learning rate.

  • factor (float, default=0.1) – Factor by which the learning rate will be reduced.

  • weight_decay (float | None) – Weight decay (L2 penalty) coefficient.

  • checkpoint_path (str, default="model_checkpoints") – Path where the checkpoints are being saved.

  • train_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during training.

  • val_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during validation.

  • dataloader_kwargs (dict, default={}) – The kwargs for the pytorch dataloader class.

  • rebuild (bool, default=True) – Whether to rebuild the model when it already was built.

  • **trainer_kwargs (Additional keyword arguments for PyTorch Lightning's Trainer class.)

Returns:

self – The fitted classifier.

Return type:

object

get_number_of_params(requires_grad=True)

Calculate the number of parameters in the model.

Parameters:

requires_grad (bool, optional) – If True, only count the parameters that require gradients (trainable parameters). If False, count all parameters. Default is True.

Returns:

The total number of parameters in the model.

Return type:

int

Raises:

ValueError – If the model has not been built prior to calling this method.

get_params(deep=True)

Get parameters for this estimator.

classmethod load(path)

Load and return a fitted model from path.

Parameters:

path (str) – Path to a file previously written by save().

Returns:

A fully reconstructed, ready-to-predict estimator of the same type that was saved.

Return type:

estimator

optimize_hparams(X, y, X_val=None, y_val=None, embeddings=None, embeddings_val=None, time=100, max_epochs=200, prune_by_epoch=True, prune_epoch=5, fixed_params={'cat_encoding': 'int', 'head_layer_size_length': 0, 'head_skip_layer': False, 'head_skip_layers': False, 'pooling_method': 'avg', 'use_cls': False}, custom_search_space=None, **optimize_kwargs)

Optimizes hyperparameters using Bayesian optimization with optional pruning.

Parameters:
  • X (array-like) – Training data.

  • y (array-like) – Training labels.

  • X_val (array-like, optional) – Validation data and labels.

  • y_val (array-like, optional) – Validation data and labels.

  • time (int) – The number of optimization trials to run.

  • max_epochs (int) – Maximum number of epochs for training.

  • prune_by_epoch (bool) – Whether to prune based on a specific epoch (True) or the best validation loss (False).

  • prune_epoch (int) – The specific epoch to prune by when prune_by_epoch is True.

  • **optimize_kwargs (dict) – Additional keyword arguments passed to the fit method.

Returns:

best_hparams – Best hyperparameters found during optimization.

Return type:

list

predict(X, embeddings=None, device=None)

Predicts target labels for the given input samples.

Parameters:

X (DataFrame or array-like, shape (n_samples, n_features)) – The input samples for which to predict target values.

Returns:

predictions – The predicted class labels.

Return type:

ndarray, shape (n_samples,)

predict_proba(X, embeddings=None, device=None)

Predicts class probabilities for the given input samples.

Parameters:

X (DataFrame or array-like, shape (n_samples, n_features)) – The input samples for which to predict class probabilities.

Returns:

probabilities – The predicted class probabilities.

Return type:

ndarray, shape (n_samples, n_classes)

pretrain(pretrain_epochs=15, k_neighbors=10, temperature=0.1, save_path='pretrained_embeddings.pth', lr=0.001, use_positive=True, use_negative=False, pool_sequence=True)

Pretrains the embedding layer of the model using a contrastive learning approach.

This method performs pretraining by optimizing the embeddings with respect to neighborhood structure in the feature space. The embeddings are saved after training.

Parameters:
  • pretrain_epochs (int, default=15) – Number of epochs to run pretraining.

  • k_neighbors (int, default=10) – Number of neighbors used in the contrastive loss computation.

  • temperature (float, default=0.1) – Temperature parameter for contrastive loss scaling.

  • save_path (str, default="pretrained_embeddings.pth") – Path to save the pretrained embeddings.

  • lr (float, default=1e-3) – Learning rate for the pretraining optimizer.

  • use_positive (bool, default=True) – Whether to include positive pairs in contrastive learning.

  • use_negative (bool, default=False) – Whether to include negative pairs in contrastive learning.

  • pool_sequence (bool, default=True) – Whether to apply sequence pooling before computing contrastive loss.

Raises:
  • ValueError – If the model has not been built before calling this method.

  • ValueError – If the model does not contain an embedding layer.

Notes

  • This function requires that self.build_model() has been called beforehand.

  • The pretraining method uses self.task_model.estimator.embedding_layer.

  • The method invokes super()._pretrain() with regression mode enabled.

save(path)

Save the fitted model to path.

The bundle written by this method can be restored with load(). It contains all state required for inference: the config, the fitted preprocessor, feature metadata, and the neural-network weights.

Parameters:

path (str) – Destination file path (e.g. "model.pt").

Raises:

ValueError – If the model has not been fitted yet.

Return type:

None

score(X, y, embeddings=None, metric=(sklearn.metrics.log_loss, True))

Calculate the score of the model using the specified metric.

Parameters:
  • X (array-like or pd.DataFrame of shape (n_samples, n_features)) – The input samples to predict.

  • y (array-like of shape (n_samples,)) – The true class labels against which to evaluate the predictions.

  • metric (tuple, default=(log_loss, True)) – A tuple containing the metric function and a boolean indicating whether the metric requires probability scores (True) or class labels (False).

Returns:

score – The score calculated using the specified metric.

Return type:

float

set_params(**parameters)

Set the parameters of this estimator.

class deeptab.models.TabTransformerRegressor(*args: Any, **kwargs: Any)[source]

TabTransformer regressor. This class extends the SklearnBaseRegressor class and uses the TabTransformer model with the default TabTransformer configuration.

Notes

The parameters for this class include the attributes from the config dataclass as well as preprocessing arguments handled by the base class.

Configuration class for the default Tab Transformer model with predefined hyperparameters.

Parameters:
  • n_layers (int, default=4) – Number of layers in the transformer.

  • n_heads (int, default=8) – Number of attention heads in the transformer.

  • d_model (int, default=128) – Dimensionality of embeddings or model representations.

  • attn_dropout (float, default=0.2) – Dropout rate for the attention mechanism.

  • ff_dropout (float, default=0.1) – Dropout rate for the feed-forward layers.

  • norm (str, default="LayerNorm") – Normalization method to be used.

  • activation (callable, default=nn.SELU()) – Activation function for the transformer layers.

  • transformer_activation (callable, default=ReGLU()) – Activation function for the transformer layers.

  • transformer_dim_feedforward (int, default=512) – Dimensionality of the feed-forward layers in the transformer.

  • norm_first (bool, default=True) – Whether to apply normalization before other operations in each transformer block.

  • bias (bool, default=True) – Whether to use bias in the linear layers.

  • head_layer_sizes (list, default=()) – Sizes of the layers in the model’s head.

  • head_dropout (float, default=0.5) – Dropout rate for the head layers.

  • head_skip_layers (bool, default=False) – Whether to skip layers in the head.

  • head_activation (callable, default=nn.SELU()) – Activation function for the head layers.

  • head_use_batch_norm (bool, default=False) – Whether to use batch normalization in the head layers.

  • pooling_method (str, default="avg") – Pooling method to be used (‘cls’, ‘avg’, etc.).

  • cat_encoding (str, default="int") – Encoding method for categorical features (‘int’, ‘one-hot’, etc.).

  • feature_preprocessing (dict, optional) – Dictionary mapping feature names to specific preprocessing methods. Overrides global defaults.

  • n_bins (int, default=64) – Number of bins used for binning-based preprocessing (e.g., for discretizers or PLE).

  • numerical_preprocessing (str, default="ple") – Preprocessing method for numerical features (e.g., “standardization”, “minmax”, “ple”, “rbf”, etc.).

  • categorical_preprocessing (str, default="int") – Preprocessing method for categorical features (e.g., “int”, “ordinal”, “onehot”).

  • use_decision_tree_bins (bool, default=False) – Whether to use decision tree binning for numerical discretization.

  • binning_strategy (str, default="uniform") – Strategy for bin placement when not using tree-based methods. Options: “uniform”, “quantile”.

  • task (str, default="regression") – Problem type used to guide preprocessing (e.g., “regression” or “classification”).

  • cat_cutoff (float or int, default=0.03) – Threshold to determine whether integer-valued features are treated as categorical.

  • treat_all_integers_as_numerical (bool, default=False) – If True, treat all integer-typed columns as numerical regardless of cardinality.

  • degree (int, default=3) – Degree of polynomial or spline basis functions where applicable.

  • scaling_strategy (str, default="minmax") – Strategy for feature scaling (e.g., “standardization”, “minmax”, etc.).

  • n_knots (int, default=64) – Number of knots used in spline-based feature expansions.

  • use_decision_tree_knots (bool, default=True) – Whether to use decision tree-based knot placement for spline transformations.

  • knots_strategy (str, default="uniform") – Strategy for placing knots for splines (“uniform” or “quantile”).

  • spline_implementation (str, default="sklearn") – Which spline backend implementation to use (e.g., “sklearn”, “custom”).

  • min_unique_vals (int, default=5) – Minimum number of unique values required for a feature to be treated as numerical.

Examples

>>> from deeptab.models import TabTransformerRegressor
>>> model = TabTransformerRegressor()
>>> model.fit(X_train, y_train)
>>> preds = model.predict(X_test)
>>> model.evaluate(X_test, y_test)
build_model(X, y, val_size=0.2, X_val=None, y_val=None, embeddings=None, embeddings_val=None, random_state=101, batch_size=128, shuffle=True, lr=None, lr_patience=None, lr_factor=None, weight_decay=None, train_metrics=None, val_metrics=None, dataloader_kwargs={})

Builds the model using the provided training data.

Parameters:
  • X (DataFrame or array-like, shape (n_samples, n_features)) – The training input samples.

  • y (array-like, shape (n_samples,) or (n_samples, n_targets)) – The target values (real numbers).

  • val_size (float) – The proportion of the dataset to include in the validation split if X_val is None. Ignored if X_val is provided.

  • X_val (DataFrame or array-like, shape (n_samples, n_features), optional) – The validation input samples. If provided, X and y are not split and this data is used for validation.

  • y_val (array-like, shape (n_samples,) or (n_samples, n_targets), optional) – The validation target values. Required if X_val is provided.

  • random_state (int) – Controls the shuffling applied to the data before applying the split.

  • batch_size (int) – Number of samples per gradient update.

  • shuffle (bool) – Whether to shuffle the training data before each epoch.

  • lr (float | None) – Learning rate for the optimizer.

  • lr_patience (int | None) – Number of epochs with no improvement on the validation loss to wait before reducing the learning rate.

  • factor (float, default=0.1) – Factor by which the learning rate will be reduced.

  • weight_decay (float | None) – Weight decay (L2 penalty) coefficient.

  • train_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during training.

  • val_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during validation.

  • dataloader_kwargs (dict, default={}) – The kwargs for the pytorch dataloader class.

Returns:

self – The built regressor.

Return type:

object

encode(X, embeddings=None, batch_size=64)

Encodes input data using the trained model’s embedding layer.

Parameters:
  • X (array-like or DataFrame) – Input data to be encoded.

  • batch_size (int, optional, default=64) – Batch size for encoding.

Returns:

Encoded representations of the input data.

Return type:

torch.Tensor

Raises:

ValueError – If the model or data module is not fitted.

evaluate(X, y_true, embeddings=None, metrics=None)

Evaluate the model on the given data using specified metrics.

Parameters:
  • X (array-like or pd.DataFrame of shape (n_samples, n_features)) – The input samples to predict.

  • y_true (array-like of shape (n_samples,) or (n_samples, n_outputs)) – The true target values against which to evaluate the predictions.

  • metrics (dict) – A dictionary where keys are metric names and values are the metric functions.

Notes

This method uses the predict method to generate predictions and computes each metric.

Returns:

scores – A dictionary with metric names as keys and their corresponding scores as values.

Return type:

dict

fit(X, y, val_size=0.2, X_val=None, y_val=None, embeddings=None, embeddings_val=None, max_epochs=100, random_state=101, batch_size=128, shuffle=True, patience=15, monitor='val_loss', mode='min', lr=None, lr_patience=None, lr_factor=None, weight_decay=None, checkpoint_path='model_checkpoints', dataloader_kwargs={}, train_metrics=None, val_metrics=None, rebuild=True, **trainer_kwargs)

Trains the regression model using the provided training data. Optionally, a separate validation set can be used.

Parameters:
  • X (DataFrame or array-like, shape (n_samples, n_features)) – The training input samples.

  • y (array-like, shape (n_samples,) or (n_samples, n_targets)) – The target values (real numbers).

  • val_size (float) – The proportion of the dataset to include in the validation split if X_val is None. Ignored if X_val is provided.

  • X_val (DataFrame or array-like, shape (n_samples, n_features), optional) – The validation input samples. If provided, X and y are not split and this data is used for validation.

  • y_val (array-like, shape (n_samples,) or (n_samples, n_targets), optional) – The validation target values. Required if X_val is provided.

  • max_epochs (int) – Maximum number of epochs for training.

  • random_state (int) – Controls the shuffling applied to the data before applying the split.

  • batch_size (int) – Number of samples per gradient update.

  • shuffle (bool) – Whether to shuffle the training data before each epoch.

  • patience (int) – Number of epochs with no improvement on the validation loss to wait before early stopping.

  • monitor (str) – The metric to monitor for early stopping.

  • mode (str) – Whether the monitored metric should be minimized (min) or maximized (max).

  • lr (float | None) – Learning rate for the optimizer.

  • lr_patience (int | None) – Number of epochs with no improvement on the validation loss to wait before reducing the learning rate.

  • factor (float, default=0.1) – Factor by which the learning rate will be reduced.

  • weight_decay (float | None) – Weight decay (L2 penalty) coefficient.

  • checkpoint_path (str, default="model_checkpoints") – Path where the checkpoints are being saved.

  • dataloader_kwargs (dict, default={}) – The kwargs for the pytorch dataloader class.

  • train_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during training.

  • val_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during validation.

  • rebuild (bool, default=True) – Whether to rebuild the model when it already was built.

  • **trainer_kwargs (Additional keyword arguments for PyTorch Lightning's Trainer class.)

Returns:

self – The fitted regressor.

Return type:

object

get_number_of_params(requires_grad=True)

Calculate the number of parameters in the model.

Parameters:

requires_grad (bool, optional) – If True, only count the parameters that require gradients (trainable parameters). If False, count all parameters. Default is True.

Returns:

The total number of parameters in the model.

Return type:

int

Raises:

ValueError – If the model has not been built prior to calling this method.

get_params(deep=True)

Get parameters for this estimator.

classmethod load(path)

Load and return a fitted model from path.

Parameters:

path (str) – Path to a file previously written by save().

Returns:

A fully reconstructed, ready-to-predict estimator of the same type that was saved.

Return type:

estimator

optimize_hparams(X, y, X_val=None, y_val=None, embeddings=None, embeddings_val=None, time=100, max_epochs=200, prune_by_epoch=True, prune_epoch=5, fixed_params={'cat_encoding': 'int', 'head_layer_size_length': 0, 'head_skip_layer': False, 'head_skip_layers': False, 'pooling_method': 'avg', 'use_cls': False}, custom_search_space=None, **optimize_kwargs)

Optimizes hyperparameters using Bayesian optimization with optional pruning.

Parameters:
  • X (array-like) – Training data.

  • y (array-like) – Training labels.

  • X_val (array-like, optional) – Validation data and labels.

  • y_val (array-like, optional) – Validation data and labels.

  • time (int) – The number of optimization trials to run.

  • max_epochs (int) – Maximum number of epochs for training.

  • prune_by_epoch (bool) – Whether to prune based on a specific epoch (True) or the best validation loss (False).

  • prune_epoch (int) – The specific epoch to prune by when prune_by_epoch is True.

  • **optimize_kwargs (dict) – Additional keyword arguments passed to the fit method.

Returns:

best_hparams – Best hyperparameters found during optimization.

Return type:

list

predict(X, embeddings=None, device=None)

Predicts target values for the given input samples.

Parameters:

X (DataFrame or array-like, shape (n_samples, n_features)) – The input samples for which to predict target values.

Returns:

predictions – The predicted target values.

Return type:

ndarray, shape (n_samples,) or (n_samples, n_outputs)

pretrain(pretrain_epochs=15, k_neighbors=10, temperature=0.1, save_path='pretrained_embeddings.pth', lr=0.001, use_positive=True, use_negative=False, pool_sequence=True)

Pretrains the embedding layer of the model using a contrastive learning approach.

This method performs pretraining by optimizing the embeddings with respect to neighborhood structure in the feature space. The embeddings are saved after training.

Parameters:
  • pretrain_epochs (int, default=15) – Number of epochs to run pretraining.

  • k_neighbors (int, default=10) – Number of neighbors used in the contrastive loss computation.

  • temperature (float, default=0.1) – Temperature parameter for contrastive loss scaling.

  • save_path (str, default="pretrained_embeddings.pth") – Path to save the pretrained embeddings.

  • lr (float, default=1e-3) – Learning rate for the pretraining optimizer.

  • use_positive (bool, default=True) – Whether to include positive pairs in contrastive learning.

  • use_negative (bool, default=False) – Whether to include negative pairs in contrastive learning.

  • pool_sequence (bool, default=True) – Whether to apply sequence pooling before computing contrastive loss.

Raises:
  • ValueError – If the model has not been built before calling this method.

  • ValueError – If the model does not contain an embedding layer.

Notes

  • This function requires that self.build_model() has been called beforehand.

  • The pretraining method uses self.task_model.estimator.embedding_layer.

  • The method invokes super()._pretrain() with regression mode enabled.

save(path)

Save the fitted model to path.

The bundle written by this method can be restored with load(). It contains all state required for inference: the config, the fitted preprocessor, feature metadata, and the neural-network weights.

Parameters:

path (str) – Destination file path (e.g. "model.pt").

Raises:

ValueError – If the model has not been fitted yet.

Return type:

None

score(X, y, embeddings=None, metric=sklearn.metrics.mean_squared_error)

Calculate the score of the model using the specified metric.

Parameters:
  • X (array-like or pd.DataFrame of shape (n_samples, n_features)) – The input samples to predict.

  • y (array-like of shape (n_samples,) or (n_samples, n_outputs)) – The true target values against which to evaluate the predictions.

  • metric (callable, default=mean_squared_error) – The metric function to use for evaluation. Must be a callable with the signature metric(y_true, y_pred).

Returns:

score – The score calculated using the specified metric.

Return type:

float

set_params(**parameters)

Set the parameters of this estimator.

class deeptab.models.TabTransformerLSS(*args: Any, **kwargs: Any)[source]

TabTransformer for distributional regression. This class extends the SklearnBaseLSS class and uses the TabTransformer model with the default TabTransformer configuration.

Notes

The parameters for this class include the attributes from the config dataclass as well as preprocessing arguments handled by the base class.

Configuration class for the default Tab Transformer model with predefined hyperparameters.

Parameters:
  • n_layers (int, default=4) – Number of layers in the transformer.

  • n_heads (int, default=8) – Number of attention heads in the transformer.

  • d_model (int, default=128) – Dimensionality of embeddings or model representations.

  • attn_dropout (float, default=0.2) – Dropout rate for the attention mechanism.

  • ff_dropout (float, default=0.1) – Dropout rate for the feed-forward layers.

  • norm (str, default="LayerNorm") – Normalization method to be used.

  • activation (callable, default=nn.SELU()) – Activation function for the transformer layers.

  • transformer_activation (callable, default=ReGLU()) – Activation function for the transformer layers.

  • transformer_dim_feedforward (int, default=512) – Dimensionality of the feed-forward layers in the transformer.

  • norm_first (bool, default=True) – Whether to apply normalization before other operations in each transformer block.

  • bias (bool, default=True) – Whether to use bias in the linear layers.

  • head_layer_sizes (list, default=()) – Sizes of the layers in the model’s head.

  • head_dropout (float, default=0.5) – Dropout rate for the head layers.

  • head_skip_layers (bool, default=False) – Whether to skip layers in the head.

  • head_activation (callable, default=nn.SELU()) – Activation function for the head layers.

  • head_use_batch_norm (bool, default=False) – Whether to use batch normalization in the head layers.

  • pooling_method (str, default="avg") – Pooling method to be used (‘cls’, ‘avg’, etc.).

  • cat_encoding (str, default="int") – Encoding method for categorical features (‘int’, ‘one-hot’, etc.).

  • feature_preprocessing (dict, optional) – Dictionary mapping feature names to specific preprocessing methods. Overrides global defaults.

  • n_bins (int, default=64) – Number of bins used for binning-based preprocessing (e.g., for discretizers or PLE).

  • numerical_preprocessing (str, default="ple") – Preprocessing method for numerical features (e.g., “standardization”, “minmax”, “ple”, “rbf”, etc.).

  • categorical_preprocessing (str, default="int") – Preprocessing method for categorical features (e.g., “int”, “ordinal”, “onehot”).

  • use_decision_tree_bins (bool, default=False) – Whether to use decision tree binning for numerical discretization.

  • binning_strategy (str, default="uniform") – Strategy for bin placement when not using tree-based methods. Options: “uniform”, “quantile”.

  • task (str, default="regression") – Problem type used to guide preprocessing (e.g., “regression” or “classification”).

  • cat_cutoff (float or int, default=0.03) – Threshold to determine whether integer-valued features are treated as categorical.

  • treat_all_integers_as_numerical (bool, default=False) – If True, treat all integer-typed columns as numerical regardless of cardinality.

  • degree (int, default=3) – Degree of polynomial or spline basis functions where applicable.

  • scaling_strategy (str, default="minmax") – Strategy for feature scaling (e.g., “standardization”, “minmax”, etc.).

  • n_knots (int, default=64) – Number of knots used in spline-based feature expansions.

  • use_decision_tree_knots (bool, default=True) – Whether to use decision tree-based knot placement for spline transformations.

  • knots_strategy (str, default="uniform") – Strategy for placing knots for splines (“uniform” or “quantile”).

  • spline_implementation (str, default="sklearn") – Which spline backend implementation to use (e.g., “sklearn”, “custom”).

  • min_unique_vals (int, default=5) – Minimum number of unique values required for a feature to be treated as numerical.

Examples

>>> from deeptab.models import TabTransformerLSS
>>> model = TabTransformerLSS()
>>> model.fit(X_train, y_train, family='normal')
>>> preds = model.predict(X_test)
>>> model.evaluate(X_test, y_test)
build_model(X, y, val_size=0.2, X_val=None, y_val=None, random_state=101, batch_size=128, shuffle=True, lr=None, lr_patience=None, lr_factor=None, weight_decay=None, train_metrics=None, val_metrics=None, dataloader_kwargs={})

Builds the model using the provided training data.

Parameters:
  • X (DataFrame or array-like, shape (n_samples, n_features)) – The training input samples.

  • y (array-like, shape (n_samples,) or (n_samples, n_targets)) – The target values (real numbers).

  • val_size (float) – The proportion of the dataset to include in the validation split if X_val is None. Ignored if X_val is provided.

  • X_val (DataFrame or array-like, shape (n_samples, n_features), optional) – The validation input samples. If provided, X and y are not split and this data is used for validation.

  • y_val (array-like, shape (n_samples,) or (n_samples, n_targets), optional) – The validation target values. Required if X_val is provided.

  • random_state (int) – Controls the shuffling applied to the data before applying the split.

  • batch_size (int) – Number of samples per gradient update.

  • shuffle (bool) – Whether to shuffle the training data before each epoch.

  • lr (float | None) – Learning rate for the optimizer.

  • lr_patience (int | None) – Number of epochs with no improvement on the validation loss to wait before reducing the learning rate.

  • lr_factor (float | None) – Factor by which the learning rate will be reduced.

  • train_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during training.

  • val_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during validation.

  • weight_decay (float | None) – Weight decay (L2 penalty) coefficient.

  • dataloader_kwargs (dict, default={}) – The kwargs for the pytorch dataloader class.

Returns:

self – The built distributional regressor.

Return type:

object

encode(X, batch_size=64)

Encodes input data using the trained model’s embedding layer.

Parameters:
  • X (array-like or DataFrame) – Input data to be encoded.

  • batch_size (int, optional, default=64) – Batch size for encoding.

Returns:

Encoded representations of the input data.

Return type:

torch.Tensor

Raises:

ValueError – If the model or data module is not fitted.

evaluate(X, y_true, metrics=None, distribution_family=None)

Evaluate the model on the given data using specified metrics.

Parameters:
  • X (array-like or pd.DataFrame of shape (n_samples, n_features)) – The input samples to predict.

  • y_true (array-like of shape (n_samples,)) – The true class labels against which to evaluate the predictions.

  • metrics (dict) – A dictionary where keys are metric names and values are tuples containing the metric function and a boolean indicating whether the metric requires probability scores (True) or class labels (False).

  • distribution_family (str, optional) – Specifies the distribution family the model is predicting for. If None, it will attempt to infer based on the model’s settings.

Returns:

scores – A dictionary with metric names as keys and their corresponding scores as values.

Return type:

dict

Notes

This method uses either the predict or predict_proba method depending on the metric requirements.

fit(X, y, family, val_size=0.2, X_val=None, y_val=None, max_epochs=100, random_state=101, batch_size=128, shuffle=True, patience=15, monitor='val_loss', mode='min', lr=None, lr_patience=None, lr_factor=None, weight_decay=None, checkpoint_path='model_checkpoints', distributional_kwargs=None, train_metrics=None, val_metrics=None, dataloader_kwargs={}, rebuild=True, **trainer_kwargs)

Trains the regression model using the provided training data. Optionally, a separate validation set can be used.

Parameters:
  • X (DataFrame or array-like, shape (n_samples, n_features)) – The training input samples.

  • y (array-like, shape (n_samples,) or (n_samples, n_targets)) – The target values (real numbers).

  • family (str) – The name of the distribution family to use for the loss function. Examples include ‘normal’ for regression tasks.

  • val_size (float) – The proportion of the dataset to include in the validation split if X_val is None. Ignored if X_val is provided.

  • X_val (DataFrame or array-like, shape (n_samples, n_features), optional) – The validation input samples. If provided, X and y are not split and this data is used for validation.

  • y_val (array-like, shape (n_samples,) or (n_samples, n_targets), optional) – The validation target values. Required if X_val is provided.

  • max_epochs (int) – Maximum number of epochs for training.

  • random_state (int) – Controls the shuffling applied to the data before applying the split.

  • batch_size (int) – Number of samples per gradient update.

  • shuffle (bool) – Whether to shuffle the training data before each epoch.

  • patience (int) – Number of epochs with no improvement on the validation loss to wait before early stopping.

  • monitor (str) – The metric to monitor for early stopping.

  • mode (str) – Whether the monitored metric should be minimized (min) or maximized (max).

  • lr (float | None) – Learning rate for the optimizer.

  • lr_patience (int | None) – Number of epochs with no improvement on the validation loss to wait before reducing the learning rate.

  • factor (float, default=0.1) – Factor by which the learning rate will be reduced.

  • weight_decay (float | None) – Weight decay (L2 penalty) coefficient.

  • distributional_kwargs (dict, default=None) – any arguments taht are specific for a certain distribution.

  • train_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during training.

  • val_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during validation.

  • checkpoint_path (str, default="model_checkpoints") – Path where the checkpoints are being saved.

  • dataloader_kwargs (dict, default={}) – The kwargs for the pytorch dataloader class.

  • **trainer_kwargs (Additional keyword arguments for PyTorch Lightning's Trainer class.)

Returns:

self – The fitted regressor.

Return type:

object

get_default_metrics(distribution_family)

Provides default metrics based on the distribution family.

Parameters:

distribution_family (str) – The distribution family for which to provide default metrics.

Returns:

metrics – A dictionary of default metric functions.

Return type:

dict

get_number_of_params(requires_grad=True)

Calculate the number of parameters in the model.

Parameters:

requires_grad (bool, optional) – If True, only count the parameters that require gradients (trainable parameters). If False, count all parameters. Default is True.

Returns:

The total number of parameters in the model.

Return type:

int

Raises:

ValueError – If the model has not been built prior to calling this method.

get_params(deep=True)

Get parameters for this estimator.

Parameters:

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

params – Parameter names mapped to their values.

Return type:

dict

classmethod load(path)

Load and return a fitted model from path.

Parameters:

path (str) – Path to a file previously written by save().

Returns:

A fully reconstructed, ready-to-predict estimator.

Return type:

estimator

optimize_hparams(X, y, X_val=None, y_val=None, time=100, max_epochs=200, prune_by_epoch=True, prune_epoch=5, fixed_params={'cat_encoding': 'int', 'head_layer_size_length': 0, 'head_skip_layer': False, 'head_skip_layers': False, 'pooling_method': 'avg', 'use_cls': False}, custom_search_space=None, **optimize_kwargs)

Optimizes hyperparameters using Bayesian optimization with optional pruning.

Parameters:
  • X (array-like) – Training data.

  • y (array-like) – Training labels.

  • X_val (array-like, optional) – Validation data and labels.

  • y_val (array-like, optional) – Validation data and labels.

  • time (int) – The number of optimization trials to run.

  • max_epochs (int) – Maximum number of epochs for training.

  • prune_by_epoch (bool) – Whether to prune based on a specific epoch (True) or the best validation loss (False).

  • prune_epoch (int) – The specific epoch to prune by when prune_by_epoch is True.

  • **optimize_kwargs (dict) – Additional keyword arguments passed to the fit method.

Returns:

best_hparams – Best hyperparameters found during optimization.

Return type:

list

predict(X, raw=False, device=None)

Predicts target values for the given input samples.

Parameters:

X (DataFrame or array-like, shape (n_samples, n_features)) – The input samples for which to predict target values.

Returns:

predictions – The predicted target values.

Return type:

ndarray, shape (n_samples,) or (n_samples, n_outputs)

save(path)

Save the fitted model to path.

Parameters:

path (str) – Destination file path (e.g. "model.pt").

Raises:

ValueError – If the model has not been fitted yet.

Return type:

None

score(X, y, metric='NLL')

Calculate the score of the model using the specified metric.

Parameters:
  • X (array-like or pd.DataFrame of shape (n_samples, n_features)) – The input samples to predict.

  • y (array-like of shape (n_samples,) or (n_samples, n_outputs)) – The true target values against which to evaluate the predictions.

  • metric (str, default="NLL") – So far, only negative log-likelihood is supported

Returns:

score – The score calculated using the specified metric.

Return type:

float

set_params(**parameters)

Set the parameters of this estimator.

Parameters:

**parameters (dict) – Estimator parameters.

Returns:

self – Estimator instance.

Return type:

object

class deeptab.models.ResNetClassifier(*args: Any, **kwargs: Any)[source]

ResNet classifier This class extends the SklearnBaseClassifier class and uses the ResNet model with the default ResNet configuration.

Notes

The parameters for this class include the attributes from the config dataclass as well as preprocessing arguments handled by the base class.

Configuration class for the default ResNet model with predefined hyperparameters.

Parameters:
  • layer_sizes (list, default=(256, 128, 32)) – Sizes of the layers in the ResNet.

  • activation (callable, default=nn.SELU()) – Activation function for the ResNet layers.

  • skip_layers (bool, default=False) – Whether to skip layers in the ResNet.

  • dropout (float, default=0.5) – Dropout rate for regularization.

  • norm (bool, default=False) – Whether to use normalization in the ResNet.

  • use_glu (bool, default=False) – Whether to use Gated Linear Units (GLU) in the ResNet.

  • skip_connections (bool, default=True) – Whether to use skip connections in the ResNet.

  • num_blocks (int, default=3) – Number of residual blocks in the ResNet.

  • average_embeddings (bool, default=True) – Whether to average embeddings during the forward pass.

  • feature_preprocessing (dict, optional) – Dictionary mapping feature names to specific preprocessing methods. Overrides global defaults.

  • n_bins (int, default=64) – Number of bins used for binning-based preprocessing (e.g., for discretizers or PLE).

  • numerical_preprocessing (str, default="ple") – Preprocessing method for numerical features (e.g., “standardization”, “minmax”, “ple”, “rbf”, etc.).

  • categorical_preprocessing (str, default="int") – Preprocessing method for categorical features (e.g., “int”, “ordinal”, “onehot”).

  • use_decision_tree_bins (bool, default=False) – Whether to use decision tree binning for numerical discretization.

  • binning_strategy (str, default="uniform") – Strategy for bin placement when not using tree-based methods. Options: “uniform”, “quantile”.

  • task (str, default="regression") – Problem type used to guide preprocessing (e.g., “regression” or “classification”).

  • cat_cutoff (float or int, default=0.03) – Threshold to determine whether integer-valued features are treated as categorical.

  • treat_all_integers_as_numerical (bool, default=False) – If True, treat all integer-typed columns as numerical regardless of cardinality.

  • degree (int, default=3) – Degree of polynomial or spline basis functions where applicable.

  • scaling_strategy (str, default="minmax") – Strategy for feature scaling (e.g., “standardization”, “minmax”, etc.).

  • n_knots (int, default=64) – Number of knots used in spline-based feature expansions.

  • use_decision_tree_knots (bool, default=True) – Whether to use decision tree-based knot placement for spline transformations.

  • knots_strategy (str, default="uniform") – Strategy for placing knots for splines (“uniform” or “quantile”).

  • spline_implementation (str, default="sklearn") – Which spline backend implementation to use (e.g., “sklearn”, “custom”).

  • min_unique_vals (int, default=5) – Minimum number of unique values required for a feature to be treated as numerical.

Examples

>>> from deeptab.models import ResNetClassifier
>>> model = ResNetClassifier()
>>> model.fit(X_train, y_train)
>>> preds = model.predict(X_test)
>>> model.evaluate(X_test, y_test)
build_model(X, y, val_size=0.2, X_val=None, y_val=None, embeddings=None, embeddings_val=None, random_state=101, batch_size=128, shuffle=True, lr=None, lr_patience=None, lr_factor=None, weight_decay=None, train_metrics=None, val_metrics=None, dataloader_kwargs={})

Builds the model using the provided training data.

Parameters:
  • X (DataFrame or array-like, shape (n_samples, n_features)) – The training input samples.

  • y (array-like, shape (n_samples,) or (n_samples, n_targets)) – The target values (real numbers).

  • val_size (float) – The proportion of the dataset to include in the validation split if X_val is None. Ignored if X_val is provided.

  • X_val (DataFrame or array-like, shape (n_samples, n_features), optional) – The validation input samples. If provided, X and y are not split and this data is used for validation.

  • y_val (array-like, shape (n_samples,) or (n_samples, n_targets), optional) – The validation target values. Required if X_val is provided.

  • random_state (int) – Controls the shuffling applied to the data before applying the split.

  • batch_size (int) – Number of samples per gradient update.

  • shuffle (bool) – Whether to shuffle the training data before each epoch.

  • lr (float | None) – Learning rate for the optimizer.

  • lr_patience (int | None) – Number of epochs with no improvement on the validation loss to wait before reducing the learning rate.

  • lr_factor (float | None) – Factor by which the learning rate will be reduced.

  • train_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during training.

  • val_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during validation.

  • weight_decay (float | None) – Weight decay (L2 penalty) coefficient.

  • dataloader_kwargs (dict, default={}) – The kwargs for the pytorch dataloader class.

Returns:

self – The built classifier.

Return type:

object

encode(X, embeddings=None, batch_size=64)

Encodes input data using the trained model’s embedding layer.

Parameters:
  • X (array-like or DataFrame) – Input data to be encoded.

  • batch_size (int, optional, default=64) – Batch size for encoding.

Returns:

Encoded representations of the input data.

Return type:

torch.Tensor

Raises:

ValueError – If the model or data module is not fitted.

evaluate(X, y_true, embeddings=None, metrics=None)

Evaluate the model on the given data using specified metrics.

Parameters:
  • X (array-like or pd.DataFrame of shape (n_samples, n_features)) – The input samples to predict.

  • y_true (array-like of shape (n_samples,)) – The true class labels against which to evaluate the predictions.

  • embneddings (array-like or list of shape(n_samples, dimension)) – List or array with embeddings for unstructured data inputs

  • metrics (dict) – A dictionary where keys are metric names and values are tuples containing the metric function and a boolean indicating whether the metric requires probability scores (True) or class labels (False).

Returns:

scores – A dictionary with metric names as keys and their corresponding scores as values.

Return type:

dict

Notes

This method uses either the predict or predict_proba method depending on the metric requirements.

fit(X, y, val_size=0.2, X_val=None, y_val=None, embeddings=None, embeddings_val=None, max_epochs=100, random_state=101, batch_size=128, shuffle=True, patience=15, monitor='val_loss', mode='min', lr=None, lr_patience=None, lr_factor=None, weight_decay=None, checkpoint_path='model_checkpoints', train_metrics=None, val_metrics=None, dataloader_kwargs={}, rebuild=True, **trainer_kwargs)

Trains the classification model using the provided training data. Optionally, a separate validation set can be used.

Parameters:
  • X (DataFrame or array-like, shape (n_samples, n_features)) – The training input samples.

  • y (array-like, shape (n_samples,) or (n_samples, n_targets)) – The target values (real numbers).

  • val_size (float) – The proportion of the dataset to include in the validation split if X_val is None. Ignored if X_val is provided.

  • X_val (DataFrame or array-like, shape (n_samples, n_features), optional) – The validation input samples. If provided, X and y are not split and this data is used for validation.

  • y_val (array-like, shape (n_samples,) or (n_samples, n_targets), optional) – The validation target values. Required if X_val is provided.

  • max_epochs (int) – Maximum number of epochs for training.

  • random_state (int) – Controls the shuffling applied to the data before applying the split.

  • batch_size (int) – Number of samples per gradient update.

  • shuffle (bool) – Whether to shuffle the training data before each epoch.

  • patience (int) – Number of epochs with no improvement on the validation loss to wait before early stopping.

  • monitor (str) – The metric to monitor for early stopping.

  • mode (str) – Whether the monitored metric should be minimized (min) or maximized (max).

  • lr (float | None) – Learning rate for the optimizer.

  • lr_patience (int | None) – Number of epochs with no improvement on the validation loss to wait before reducing the learning rate.

  • factor (float, default=0.1) – Factor by which the learning rate will be reduced.

  • weight_decay (float | None) – Weight decay (L2 penalty) coefficient.

  • checkpoint_path (str, default="model_checkpoints") – Path where the checkpoints are being saved.

  • train_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during training.

  • val_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during validation.

  • dataloader_kwargs (dict, default={}) – The kwargs for the pytorch dataloader class.

  • rebuild (bool, default=True) – Whether to rebuild the model when it already was built.

  • **trainer_kwargs (Additional keyword arguments for PyTorch Lightning's Trainer class.)

Returns:

self – The fitted classifier.

Return type:

object

get_number_of_params(requires_grad=True)

Calculate the number of parameters in the model.

Parameters:

requires_grad (bool, optional) – If True, only count the parameters that require gradients (trainable parameters). If False, count all parameters. Default is True.

Returns:

The total number of parameters in the model.

Return type:

int

Raises:

ValueError – If the model has not been built prior to calling this method.

get_params(deep=True)

Get parameters for this estimator.

classmethod load(path)

Load and return a fitted model from path.

Parameters:

path (str) – Path to a file previously written by save().

Returns:

A fully reconstructed, ready-to-predict estimator of the same type that was saved.

Return type:

estimator

optimize_hparams(X, y, X_val=None, y_val=None, embeddings=None, embeddings_val=None, time=100, max_epochs=200, prune_by_epoch=True, prune_epoch=5, fixed_params={'cat_encoding': 'int', 'head_layer_size_length': 0, 'head_skip_layer': False, 'head_skip_layers': False, 'pooling_method': 'avg', 'use_cls': False}, custom_search_space=None, **optimize_kwargs)

Optimizes hyperparameters using Bayesian optimization with optional pruning.

Parameters:
  • X (array-like) – Training data.

  • y (array-like) – Training labels.

  • X_val (array-like, optional) – Validation data and labels.

  • y_val (array-like, optional) – Validation data and labels.

  • time (int) – The number of optimization trials to run.

  • max_epochs (int) – Maximum number of epochs for training.

  • prune_by_epoch (bool) – Whether to prune based on a specific epoch (True) or the best validation loss (False).

  • prune_epoch (int) – The specific epoch to prune by when prune_by_epoch is True.

  • **optimize_kwargs (dict) – Additional keyword arguments passed to the fit method.

Returns:

best_hparams – Best hyperparameters found during optimization.

Return type:

list

predict(X, embeddings=None, device=None)

Predicts target labels for the given input samples.

Parameters:

X (DataFrame or array-like, shape (n_samples, n_features)) – The input samples for which to predict target values.

Returns:

predictions – The predicted class labels.

Return type:

ndarray, shape (n_samples,)

predict_proba(X, embeddings=None, device=None)

Predicts class probabilities for the given input samples.

Parameters:

X (DataFrame or array-like, shape (n_samples, n_features)) – The input samples for which to predict class probabilities.

Returns:

probabilities – The predicted class probabilities.

Return type:

ndarray, shape (n_samples, n_classes)

pretrain(pretrain_epochs=15, k_neighbors=10, temperature=0.1, save_path='pretrained_embeddings.pth', lr=0.001, use_positive=True, use_negative=False, pool_sequence=True)

Pretrains the embedding layer of the model using a contrastive learning approach.

This method performs pretraining by optimizing the embeddings with respect to neighborhood structure in the feature space. The embeddings are saved after training.

Parameters:
  • pretrain_epochs (int, default=15) – Number of epochs to run pretraining.

  • k_neighbors (int, default=10) – Number of neighbors used in the contrastive loss computation.

  • temperature (float, default=0.1) – Temperature parameter for contrastive loss scaling.

  • save_path (str, default="pretrained_embeddings.pth") – Path to save the pretrained embeddings.

  • lr (float, default=1e-3) – Learning rate for the pretraining optimizer.

  • use_positive (bool, default=True) – Whether to include positive pairs in contrastive learning.

  • use_negative (bool, default=False) – Whether to include negative pairs in contrastive learning.

  • pool_sequence (bool, default=True) – Whether to apply sequence pooling before computing contrastive loss.

Raises:
  • ValueError – If the model has not been built before calling this method.

  • ValueError – If the model does not contain an embedding layer.

Notes

  • This function requires that self.build_model() has been called beforehand.

  • The pretraining method uses self.task_model.estimator.embedding_layer.

  • The method invokes super()._pretrain() with regression mode enabled.

save(path)

Save the fitted model to path.

The bundle written by this method can be restored with load(). It contains all state required for inference: the config, the fitted preprocessor, feature metadata, and the neural-network weights.

Parameters:

path (str) – Destination file path (e.g. "model.pt").

Raises:

ValueError – If the model has not been fitted yet.

Return type:

None

score(X, y, embeddings=None, metric=(sklearn.metrics.log_loss, True))

Calculate the score of the model using the specified metric.

Parameters:
  • X (array-like or pd.DataFrame of shape (n_samples, n_features)) – The input samples to predict.

  • y (array-like of shape (n_samples,)) – The true class labels against which to evaluate the predictions.

  • metric (tuple, default=(log_loss, True)) – A tuple containing the metric function and a boolean indicating whether the metric requires probability scores (True) or class labels (False).

Returns:

score – The score calculated using the specified metric.

Return type:

float

set_params(**parameters)

Set the parameters of this estimator.

class deeptab.models.ResNetRegressor(*args: Any, **kwargs: Any)[source]

ResNet regressor. This class extends the SklearnBaseRegressor class and uses the ResNet model with the default ResNet configuration.

Notes

The parameters for this class include the attributes from the config dataclass as well as preprocessing arguments handled by the base class.

Configuration class for the default ResNet model with predefined hyperparameters.

Parameters:
  • layer_sizes (list, default=(256, 128, 32)) – Sizes of the layers in the ResNet.

  • activation (callable, default=nn.SELU()) – Activation function for the ResNet layers.

  • skip_layers (bool, default=False) – Whether to skip layers in the ResNet.

  • dropout (float, default=0.5) – Dropout rate for regularization.

  • norm (bool, default=False) – Whether to use normalization in the ResNet.

  • use_glu (bool, default=False) – Whether to use Gated Linear Units (GLU) in the ResNet.

  • skip_connections (bool, default=True) – Whether to use skip connections in the ResNet.

  • num_blocks (int, default=3) – Number of residual blocks in the ResNet.

  • average_embeddings (bool, default=True) – Whether to average embeddings during the forward pass.

  • feature_preprocessing (dict, optional) – Dictionary mapping feature names to specific preprocessing methods. Overrides global defaults.

  • n_bins (int, default=64) – Number of bins used for binning-based preprocessing (e.g., for discretizers or PLE).

  • numerical_preprocessing (str, default="ple") – Preprocessing method for numerical features (e.g., “standardization”, “minmax”, “ple”, “rbf”, etc.).

  • categorical_preprocessing (str, default="int") – Preprocessing method for categorical features (e.g., “int”, “ordinal”, “onehot”).

  • use_decision_tree_bins (bool, default=False) – Whether to use decision tree binning for numerical discretization.

  • binning_strategy (str, default="uniform") – Strategy for bin placement when not using tree-based methods. Options: “uniform”, “quantile”.

  • task (str, default="regression") – Problem type used to guide preprocessing (e.g., “regression” or “classification”).

  • cat_cutoff (float or int, default=0.03) – Threshold to determine whether integer-valued features are treated as categorical.

  • treat_all_integers_as_numerical (bool, default=False) – If True, treat all integer-typed columns as numerical regardless of cardinality.

  • degree (int, default=3) – Degree of polynomial or spline basis functions where applicable.

  • scaling_strategy (str, default="minmax") – Strategy for feature scaling (e.g., “standardization”, “minmax”, etc.).

  • n_knots (int, default=64) – Number of knots used in spline-based feature expansions.

  • use_decision_tree_knots (bool, default=True) – Whether to use decision tree-based knot placement for spline transformations.

  • knots_strategy (str, default="uniform") – Strategy for placing knots for splines (“uniform” or “quantile”).

  • spline_implementation (str, default="sklearn") – Which spline backend implementation to use (e.g., “sklearn”, “custom”).

  • min_unique_vals (int, default=5) – Minimum number of unique values required for a feature to be treated as numerical.

Examples

>>> from deeptab.models import ResNetRegressor
>>> model = ResNetRegressor()
>>> model.fit(X_train, y_train)
>>> preds = model.predict(X_test)
>>> model.evaluate(X_test, y_test)
build_model(X, y, val_size=0.2, X_val=None, y_val=None, embeddings=None, embeddings_val=None, random_state=101, batch_size=128, shuffle=True, lr=None, lr_patience=None, lr_factor=None, weight_decay=None, train_metrics=None, val_metrics=None, dataloader_kwargs={})

Builds the model using the provided training data.

Parameters:
  • X (DataFrame or array-like, shape (n_samples, n_features)) – The training input samples.

  • y (array-like, shape (n_samples,) or (n_samples, n_targets)) – The target values (real numbers).

  • val_size (float) – The proportion of the dataset to include in the validation split if X_val is None. Ignored if X_val is provided.

  • X_val (DataFrame or array-like, shape (n_samples, n_features), optional) – The validation input samples. If provided, X and y are not split and this data is used for validation.

  • y_val (array-like, shape (n_samples,) or (n_samples, n_targets), optional) – The validation target values. Required if X_val is provided.

  • random_state (int) – Controls the shuffling applied to the data before applying the split.

  • batch_size (int) – Number of samples per gradient update.

  • shuffle (bool) – Whether to shuffle the training data before each epoch.

  • lr (float | None) – Learning rate for the optimizer.

  • lr_patience (int | None) – Number of epochs with no improvement on the validation loss to wait before reducing the learning rate.

  • factor (float, default=0.1) – Factor by which the learning rate will be reduced.

  • weight_decay (float | None) – Weight decay (L2 penalty) coefficient.

  • train_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during training.

  • val_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during validation.

  • dataloader_kwargs (dict, default={}) – The kwargs for the pytorch dataloader class.

Returns:

self – The built regressor.

Return type:

object

encode(X, embeddings=None, batch_size=64)

Encodes input data using the trained model’s embedding layer.

Parameters:
  • X (array-like or DataFrame) – Input data to be encoded.

  • batch_size (int, optional, default=64) – Batch size for encoding.

Returns:

Encoded representations of the input data.

Return type:

torch.Tensor

Raises:

ValueError – If the model or data module is not fitted.

evaluate(X, y_true, embeddings=None, metrics=None)

Evaluate the model on the given data using specified metrics.

Parameters:
  • X (array-like or pd.DataFrame of shape (n_samples, n_features)) – The input samples to predict.

  • y_true (array-like of shape (n_samples,) or (n_samples, n_outputs)) – The true target values against which to evaluate the predictions.

  • metrics (dict) – A dictionary where keys are metric names and values are the metric functions.

Notes

This method uses the predict method to generate predictions and computes each metric.

Returns:

scores – A dictionary with metric names as keys and their corresponding scores as values.

Return type:

dict

fit(X, y, val_size=0.2, X_val=None, y_val=None, embeddings=None, embeddings_val=None, max_epochs=100, random_state=101, batch_size=128, shuffle=True, patience=15, monitor='val_loss', mode='min', lr=None, lr_patience=None, lr_factor=None, weight_decay=None, checkpoint_path='model_checkpoints', dataloader_kwargs={}, train_metrics=None, val_metrics=None, rebuild=True, **trainer_kwargs)

Trains the regression model using the provided training data. Optionally, a separate validation set can be used.

Parameters:
  • X (DataFrame or array-like, shape (n_samples, n_features)) – The training input samples.

  • y (array-like, shape (n_samples,) or (n_samples, n_targets)) – The target values (real numbers).

  • val_size (float) – The proportion of the dataset to include in the validation split if X_val is None. Ignored if X_val is provided.

  • X_val (DataFrame or array-like, shape (n_samples, n_features), optional) – The validation input samples. If provided, X and y are not split and this data is used for validation.

  • y_val (array-like, shape (n_samples,) or (n_samples, n_targets), optional) – The validation target values. Required if X_val is provided.

  • max_epochs (int) – Maximum number of epochs for training.

  • random_state (int) – Controls the shuffling applied to the data before applying the split.

  • batch_size (int) – Number of samples per gradient update.

  • shuffle (bool) – Whether to shuffle the training data before each epoch.

  • patience (int) – Number of epochs with no improvement on the validation loss to wait before early stopping.

  • monitor (str) – The metric to monitor for early stopping.

  • mode (str) – Whether the monitored metric should be minimized (min) or maximized (max).

  • lr (float | None) – Learning rate for the optimizer.

  • lr_patience (int | None) – Number of epochs with no improvement on the validation loss to wait before reducing the learning rate.

  • factor (float, default=0.1) – Factor by which the learning rate will be reduced.

  • weight_decay (float | None) – Weight decay (L2 penalty) coefficient.

  • checkpoint_path (str, default="model_checkpoints") – Path where the checkpoints are being saved.

  • dataloader_kwargs (dict, default={}) – The kwargs for the pytorch dataloader class.

  • train_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during training.

  • val_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during validation.

  • rebuild (bool, default=True) – Whether to rebuild the model when it already was built.

  • **trainer_kwargs (Additional keyword arguments for PyTorch Lightning's Trainer class.)

Returns:

self – The fitted regressor.

Return type:

object

get_number_of_params(requires_grad=True)

Calculate the number of parameters in the model.

Parameters:

requires_grad (bool, optional) – If True, only count the parameters that require gradients (trainable parameters). If False, count all parameters. Default is True.

Returns:

The total number of parameters in the model.

Return type:

int

Raises:

ValueError – If the model has not been built prior to calling this method.

get_params(deep=True)

Get parameters for this estimator.

classmethod load(path)

Load and return a fitted model from path.

Parameters:

path (str) – Path to a file previously written by save().

Returns:

A fully reconstructed, ready-to-predict estimator of the same type that was saved.

Return type:

estimator

optimize_hparams(X, y, X_val=None, y_val=None, embeddings=None, embeddings_val=None, time=100, max_epochs=200, prune_by_epoch=True, prune_epoch=5, fixed_params={'cat_encoding': 'int', 'head_layer_size_length': 0, 'head_skip_layer': False, 'head_skip_layers': False, 'pooling_method': 'avg', 'use_cls': False}, custom_search_space=None, **optimize_kwargs)

Optimizes hyperparameters using Bayesian optimization with optional pruning.

Parameters:
  • X (array-like) – Training data.

  • y (array-like) – Training labels.

  • X_val (array-like, optional) – Validation data and labels.

  • y_val (array-like, optional) – Validation data and labels.

  • time (int) – The number of optimization trials to run.

  • max_epochs (int) – Maximum number of epochs for training.

  • prune_by_epoch (bool) – Whether to prune based on a specific epoch (True) or the best validation loss (False).

  • prune_epoch (int) – The specific epoch to prune by when prune_by_epoch is True.

  • **optimize_kwargs (dict) – Additional keyword arguments passed to the fit method.

Returns:

best_hparams – Best hyperparameters found during optimization.

Return type:

list

predict(X, embeddings=None, device=None)

Predicts target values for the given input samples.

Parameters:

X (DataFrame or array-like, shape (n_samples, n_features)) – The input samples for which to predict target values.

Returns:

predictions – The predicted target values.

Return type:

ndarray, shape (n_samples,) or (n_samples, n_outputs)

pretrain(pretrain_epochs=15, k_neighbors=10, temperature=0.1, save_path='pretrained_embeddings.pth', lr=0.001, use_positive=True, use_negative=False, pool_sequence=True)

Pretrains the embedding layer of the model using a contrastive learning approach.

This method performs pretraining by optimizing the embeddings with respect to neighborhood structure in the feature space. The embeddings are saved after training.

Parameters:
  • pretrain_epochs (int, default=15) – Number of epochs to run pretraining.

  • k_neighbors (int, default=10) – Number of neighbors used in the contrastive loss computation.

  • temperature (float, default=0.1) – Temperature parameter for contrastive loss scaling.

  • save_path (str, default="pretrained_embeddings.pth") – Path to save the pretrained embeddings.

  • lr (float, default=1e-3) – Learning rate for the pretraining optimizer.

  • use_positive (bool, default=True) – Whether to include positive pairs in contrastive learning.

  • use_negative (bool, default=False) – Whether to include negative pairs in contrastive learning.

  • pool_sequence (bool, default=True) – Whether to apply sequence pooling before computing contrastive loss.

Raises:
  • ValueError – If the model has not been built before calling this method.

  • ValueError – If the model does not contain an embedding layer.

Notes

  • This function requires that self.build_model() has been called beforehand.

  • The pretraining method uses self.task_model.estimator.embedding_layer.

  • The method invokes super()._pretrain() with regression mode enabled.

save(path)

Save the fitted model to path.

The bundle written by this method can be restored with load(). It contains all state required for inference: the config, the fitted preprocessor, feature metadata, and the neural-network weights.

Parameters:

path (str) – Destination file path (e.g. "model.pt").

Raises:

ValueError – If the model has not been fitted yet.

Return type:

None

score(X, y, embeddings=None, metric=sklearn.metrics.mean_squared_error)

Calculate the score of the model using the specified metric.

Parameters:
  • X (array-like or pd.DataFrame of shape (n_samples, n_features)) – The input samples to predict.

  • y (array-like of shape (n_samples,) or (n_samples, n_outputs)) – The true target values against which to evaluate the predictions.

  • metric (callable, default=mean_squared_error) – The metric function to use for evaluation. Must be a callable with the signature metric(y_true, y_pred).

Returns:

score – The score calculated using the specified metric.

Return type:

float

set_params(**parameters)

Set the parameters of this estimator.

class deeptab.models.ResNetLSS(*args: Any, **kwargs: Any)[source]

ResNet for distributional regressor. This class extends the SklearnBaseLSS class and uses the ResNet model with the default ResNet configuration.

Notes

The parameters for this class include the attributes from the config dataclass as well as preprocessing arguments handled by the base class.

Configuration class for the default ResNet model with predefined hyperparameters.

Parameters:
  • layer_sizes (list, default=(256, 128, 32)) – Sizes of the layers in the ResNet.

  • activation (callable, default=nn.SELU()) – Activation function for the ResNet layers.

  • skip_layers (bool, default=False) – Whether to skip layers in the ResNet.

  • dropout (float, default=0.5) – Dropout rate for regularization.

  • norm (bool, default=False) – Whether to use normalization in the ResNet.

  • use_glu (bool, default=False) – Whether to use Gated Linear Units (GLU) in the ResNet.

  • skip_connections (bool, default=True) – Whether to use skip connections in the ResNet.

  • num_blocks (int, default=3) – Number of residual blocks in the ResNet.

  • average_embeddings (bool, default=True) – Whether to average embeddings during the forward pass.

  • feature_preprocessing (dict, optional) – Dictionary mapping feature names to specific preprocessing methods. Overrides global defaults.

  • n_bins (int, default=64) – Number of bins used for binning-based preprocessing (e.g., for discretizers or PLE).

  • numerical_preprocessing (str, default="ple") – Preprocessing method for numerical features (e.g., “standardization”, “minmax”, “ple”, “rbf”, etc.).

  • categorical_preprocessing (str, default="int") – Preprocessing method for categorical features (e.g., “int”, “ordinal”, “onehot”).

  • use_decision_tree_bins (bool, default=False) – Whether to use decision tree binning for numerical discretization.

  • binning_strategy (str, default="uniform") – Strategy for bin placement when not using tree-based methods. Options: “uniform”, “quantile”.

  • task (str, default="regression") – Problem type used to guide preprocessing (e.g., “regression” or “classification”).

  • cat_cutoff (float or int, default=0.03) – Threshold to determine whether integer-valued features are treated as categorical.

  • treat_all_integers_as_numerical (bool, default=False) – If True, treat all integer-typed columns as numerical regardless of cardinality.

  • degree (int, default=3) – Degree of polynomial or spline basis functions where applicable.

  • scaling_strategy (str, default="minmax") – Strategy for feature scaling (e.g., “standardization”, “minmax”, etc.).

  • n_knots (int, default=64) – Number of knots used in spline-based feature expansions.

  • use_decision_tree_knots (bool, default=True) – Whether to use decision tree-based knot placement for spline transformations.

  • knots_strategy (str, default="uniform") – Strategy for placing knots for splines (“uniform” or “quantile”).

  • spline_implementation (str, default="sklearn") – Which spline backend implementation to use (e.g., “sklearn”, “custom”).

  • min_unique_vals (int, default=5) – Minimum number of unique values required for a feature to be treated as numerical.

Examples

>>> from deeptab.models import ResNetLSS
>>> model = ResNetLSS()
>>> model.fit(X_train, y_train, family='normal')
>>> preds = model.predict(X_test)
>>> model.evaluate(X_test, y_test)
build_model(X, y, val_size=0.2, X_val=None, y_val=None, random_state=101, batch_size=128, shuffle=True, lr=None, lr_patience=None, lr_factor=None, weight_decay=None, train_metrics=None, val_metrics=None, dataloader_kwargs={})

Builds the model using the provided training data.

Parameters:
  • X (DataFrame or array-like, shape (n_samples, n_features)) – The training input samples.

  • y (array-like, shape (n_samples,) or (n_samples, n_targets)) – The target values (real numbers).

  • val_size (float) – The proportion of the dataset to include in the validation split if X_val is None. Ignored if X_val is provided.

  • X_val (DataFrame or array-like, shape (n_samples, n_features), optional) – The validation input samples. If provided, X and y are not split and this data is used for validation.

  • y_val (array-like, shape (n_samples,) or (n_samples, n_targets), optional) – The validation target values. Required if X_val is provided.

  • random_state (int) – Controls the shuffling applied to the data before applying the split.

  • batch_size (int) – Number of samples per gradient update.

  • shuffle (bool) – Whether to shuffle the training data before each epoch.

  • lr (float | None) – Learning rate for the optimizer.

  • lr_patience (int | None) – Number of epochs with no improvement on the validation loss to wait before reducing the learning rate.

  • lr_factor (float | None) – Factor by which the learning rate will be reduced.

  • train_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during training.

  • val_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during validation.

  • weight_decay (float | None) – Weight decay (L2 penalty) coefficient.

  • dataloader_kwargs (dict, default={}) – The kwargs for the pytorch dataloader class.

Returns:

self – The built distributional regressor.

Return type:

object

encode(X, batch_size=64)

Encodes input data using the trained model’s embedding layer.

Parameters:
  • X (array-like or DataFrame) – Input data to be encoded.

  • batch_size (int, optional, default=64) – Batch size for encoding.

Returns:

Encoded representations of the input data.

Return type:

torch.Tensor

Raises:

ValueError – If the model or data module is not fitted.

evaluate(X, y_true, metrics=None, distribution_family=None)

Evaluate the model on the given data using specified metrics.

Parameters:
  • X (array-like or pd.DataFrame of shape (n_samples, n_features)) – The input samples to predict.

  • y_true (array-like of shape (n_samples,)) – The true class labels against which to evaluate the predictions.

  • metrics (dict) – A dictionary where keys are metric names and values are tuples containing the metric function and a boolean indicating whether the metric requires probability scores (True) or class labels (False).

  • distribution_family (str, optional) – Specifies the distribution family the model is predicting for. If None, it will attempt to infer based on the model’s settings.

Returns:

scores – A dictionary with metric names as keys and their corresponding scores as values.

Return type:

dict

Notes

This method uses either the predict or predict_proba method depending on the metric requirements.

fit(X, y, family, val_size=0.2, X_val=None, y_val=None, max_epochs=100, random_state=101, batch_size=128, shuffle=True, patience=15, monitor='val_loss', mode='min', lr=None, lr_patience=None, lr_factor=None, weight_decay=None, checkpoint_path='model_checkpoints', distributional_kwargs=None, train_metrics=None, val_metrics=None, dataloader_kwargs={}, rebuild=True, **trainer_kwargs)

Trains the regression model using the provided training data. Optionally, a separate validation set can be used.

Parameters:
  • X (DataFrame or array-like, shape (n_samples, n_features)) – The training input samples.

  • y (array-like, shape (n_samples,) or (n_samples, n_targets)) – The target values (real numbers).

  • family (str) – The name of the distribution family to use for the loss function. Examples include ‘normal’ for regression tasks.

  • val_size (float) – The proportion of the dataset to include in the validation split if X_val is None. Ignored if X_val is provided.

  • X_val (DataFrame or array-like, shape (n_samples, n_features), optional) – The validation input samples. If provided, X and y are not split and this data is used for validation.

  • y_val (array-like, shape (n_samples,) or (n_samples, n_targets), optional) – The validation target values. Required if X_val is provided.

  • max_epochs (int) – Maximum number of epochs for training.

  • random_state (int) – Controls the shuffling applied to the data before applying the split.

  • batch_size (int) – Number of samples per gradient update.

  • shuffle (bool) – Whether to shuffle the training data before each epoch.

  • patience (int) – Number of epochs with no improvement on the validation loss to wait before early stopping.

  • monitor (str) – The metric to monitor for early stopping.

  • mode (str) – Whether the monitored metric should be minimized (min) or maximized (max).

  • lr (float | None) – Learning rate for the optimizer.

  • lr_patience (int | None) – Number of epochs with no improvement on the validation loss to wait before reducing the learning rate.

  • factor (float, default=0.1) – Factor by which the learning rate will be reduced.

  • weight_decay (float | None) – Weight decay (L2 penalty) coefficient.

  • distributional_kwargs (dict, default=None) – any arguments taht are specific for a certain distribution.

  • train_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during training.

  • val_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during validation.

  • checkpoint_path (str, default="model_checkpoints") – Path where the checkpoints are being saved.

  • dataloader_kwargs (dict, default={}) – The kwargs for the pytorch dataloader class.

  • **trainer_kwargs (Additional keyword arguments for PyTorch Lightning's Trainer class.)

Returns:

self – The fitted regressor.

Return type:

object

get_default_metrics(distribution_family)

Provides default metrics based on the distribution family.

Parameters:

distribution_family (str) – The distribution family for which to provide default metrics.

Returns:

metrics – A dictionary of default metric functions.

Return type:

dict

get_number_of_params(requires_grad=True)

Calculate the number of parameters in the model.

Parameters:

requires_grad (bool, optional) – If True, only count the parameters that require gradients (trainable parameters). If False, count all parameters. Default is True.

Returns:

The total number of parameters in the model.

Return type:

int

Raises:

ValueError – If the model has not been built prior to calling this method.

get_params(deep=True)

Get parameters for this estimator.

Parameters:

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

params – Parameter names mapped to their values.

Return type:

dict

classmethod load(path)

Load and return a fitted model from path.

Parameters:

path (str) – Path to a file previously written by save().

Returns:

A fully reconstructed, ready-to-predict estimator.

Return type:

estimator

optimize_hparams(X, y, X_val=None, y_val=None, time=100, max_epochs=200, prune_by_epoch=True, prune_epoch=5, fixed_params={'cat_encoding': 'int', 'head_layer_size_length': 0, 'head_skip_layer': False, 'head_skip_layers': False, 'pooling_method': 'avg', 'use_cls': False}, custom_search_space=None, **optimize_kwargs)

Optimizes hyperparameters using Bayesian optimization with optional pruning.

Parameters:
  • X (array-like) – Training data.

  • y (array-like) – Training labels.

  • X_val (array-like, optional) – Validation data and labels.

  • y_val (array-like, optional) – Validation data and labels.

  • time (int) – The number of optimization trials to run.

  • max_epochs (int) – Maximum number of epochs for training.

  • prune_by_epoch (bool) – Whether to prune based on a specific epoch (True) or the best validation loss (False).

  • prune_epoch (int) – The specific epoch to prune by when prune_by_epoch is True.

  • **optimize_kwargs (dict) – Additional keyword arguments passed to the fit method.

Returns:

best_hparams – Best hyperparameters found during optimization.

Return type:

list

predict(X, raw=False, device=None)

Predicts target values for the given input samples.

Parameters:

X (DataFrame or array-like, shape (n_samples, n_features)) – The input samples for which to predict target values.

Returns:

predictions – The predicted target values.

Return type:

ndarray, shape (n_samples,) or (n_samples, n_outputs)

save(path)

Save the fitted model to path.

Parameters:

path (str) – Destination file path (e.g. "model.pt").

Raises:

ValueError – If the model has not been fitted yet.

Return type:

None

score(X, y, metric='NLL')

Calculate the score of the model using the specified metric.

Parameters:
  • X (array-like or pd.DataFrame of shape (n_samples, n_features)) – The input samples to predict.

  • y (array-like of shape (n_samples,) or (n_samples, n_outputs)) – The true target values against which to evaluate the predictions.

  • metric (str, default="NLL") – So far, only negative log-likelihood is supported

Returns:

score – The score calculated using the specified metric.

Return type:

float

set_params(**parameters)

Set the parameters of this estimator.

Parameters:

**parameters (dict) – Estimator parameters.

Returns:

self – Estimator instance.

Return type:

object

class deeptab.models.MambaTabClassifier(*args: Any, **kwargs: Any)[source]

MambaTab classifier. This class extends the SklearnBaseClassifier class and uses the MambaTab model with the default MambaTab configuration.

Notes

The parameters for this class include the attributes from the config dataclass as well as preprocessing arguments handled by the base class.

Configuration class for the Default MambaTab model with predefined hyperparameters.

Parameters:
  • d_model (int, default=64) – Dimensionality of the model.

  • n_layers (int, default=1) – Number of layers in the model.

  • expand_factor (int, default=2) – Expansion factor for the feed-forward layers.

  • bias (bool, default=False) – Whether to use bias in the linear layers.

  • d_conv (int, default=16) – Dimensionality of the convolutional layers.

  • conv_bias (bool, default=True) – Whether to use bias in the convolutional layers.

  • dropout (float, default=0.05) – Dropout rate for regularization.

  • dt_rank (str, default="auto") – Rank of the decision tree used in the model.

  • d_state (int, default=128) – Dimensionality of the state in recurrent layers.

  • dt_scale (float, default=1.0) – Scaling factor for the decision tree.

  • dt_init (str, default="random") – Initialization method for the decision tree.

  • dt_max (float, default=0.1) – Maximum value for decision tree initialization.

  • dt_min (float, default=1e-04) – Minimum value for decision tree initialization.

  • dt_init_floor (float, default=1e-04) – Floor value for decision tree initialization.

  • activation (callable, default=nn.ReLU()) – Activation function for the model.

  • axis (int, default=1) – Axis along which operations are applied, if applicable.

  • head_layer_sizes (list, default=()) – Sizes of the fully connected layers in the model’s head.

  • head_dropout (float, default=0.0) – Dropout rate for the head layers.

  • head_skip_layers (bool, default=False) – Whether to skip layers in the head.

  • head_activation (callable, default=nn.ReLU()) – Activation function for the head layers.

  • head_use_batch_norm (bool, default=False) – Whether to use batch normalization in the head layers.

  • norm (str, default="LayerNorm") – Type of normalization to be used (‘LayerNorm’, ‘RMSNorm’, etc.).

  • use_pscan (bool, default=False) – Whether to use PSCAN for the state-space model.

  • mamba_version (str, default="mamba-torch") – Version of the Mamba model to use (‘mamba-torch’, ‘mamba1’, ‘mamba2’).

  • bidirectional (bool, default=False) – Whether to process data bidirectionally.

  • feature_preprocessing (dict, optional) – Dictionary mapping feature names to specific preprocessing methods. Overrides global defaults.

  • n_bins (int, default=64) – Number of bins used for binning-based preprocessing (e.g., for discretizers or PLE).

  • numerical_preprocessing (str, default="ple") – Preprocessing method for numerical features (e.g., “standardization”, “minmax”, “ple”, “rbf”, etc.).

  • categorical_preprocessing (str, default="int") – Preprocessing method for categorical features (e.g., “int”, “ordinal”, “onehot”).

  • use_decision_tree_bins (bool, default=False) – Whether to use decision tree binning for numerical discretization.

  • binning_strategy (str, default="uniform") – Strategy for bin placement when not using tree-based methods. Options: “uniform”, “quantile”.

  • task (str, default="regression") – Problem type used to guide preprocessing (e.g., “regression” or “classification”).

  • cat_cutoff (float or int, default=0.03) – Threshold to determine whether integer-valued features are treated as categorical.

  • treat_all_integers_as_numerical (bool, default=False) – If True, treat all integer-typed columns as numerical regardless of cardinality.

  • degree (int, default=3) – Degree of polynomial or spline basis functions where applicable.

  • scaling_strategy (str, default="minmax") – Strategy for feature scaling (e.g., “standardization”, “minmax”, etc.).

  • n_knots (int, default=64) – Number of knots used in spline-based feature expansions.

  • use_decision_tree_knots (bool, default=True) – Whether to use decision tree-based knot placement for spline transformations.

  • knots_strategy (str, default="uniform") – Strategy for placing knots for splines (“uniform” or “quantile”).

  • spline_implementation (str, default="sklearn") – Which spline backend implementation to use (e.g., “sklearn”, “custom”).

  • min_unique_vals (int, default=5) – Minimum number of unique values required for a feature to be treated as numerical.

Examples

>>> from deeptab.models import MambaTabClassifier
>>> model = MambaTabClassifier(d_model=64, n_layers=2)
>>> model.fit(X_train, y_train)
>>> preds = model.predict(X_test)
>>> model.evaluate(X_test, y_test)
build_model(X, y, val_size=0.2, X_val=None, y_val=None, embeddings=None, embeddings_val=None, random_state=101, batch_size=128, shuffle=True, lr=None, lr_patience=None, lr_factor=None, weight_decay=None, train_metrics=None, val_metrics=None, dataloader_kwargs={})

Builds the model using the provided training data.

Parameters:
  • X (DataFrame or array-like, shape (n_samples, n_features)) – The training input samples.

  • y (array-like, shape (n_samples,) or (n_samples, n_targets)) – The target values (real numbers).

  • val_size (float) – The proportion of the dataset to include in the validation split if X_val is None. Ignored if X_val is provided.

  • X_val (DataFrame or array-like, shape (n_samples, n_features), optional) – The validation input samples. If provided, X and y are not split and this data is used for validation.

  • y_val (array-like, shape (n_samples,) or (n_samples, n_targets), optional) – The validation target values. Required if X_val is provided.

  • random_state (int) – Controls the shuffling applied to the data before applying the split.

  • batch_size (int) – Number of samples per gradient update.

  • shuffle (bool) – Whether to shuffle the training data before each epoch.

  • lr (float | None) – Learning rate for the optimizer.

  • lr_patience (int | None) – Number of epochs with no improvement on the validation loss to wait before reducing the learning rate.

  • lr_factor (float | None) – Factor by which the learning rate will be reduced.

  • train_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during training.

  • val_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during validation.

  • weight_decay (float | None) – Weight decay (L2 penalty) coefficient.

  • dataloader_kwargs (dict, default={}) – The kwargs for the pytorch dataloader class.

Returns:

self – The built classifier.

Return type:

object

encode(X, embeddings=None, batch_size=64)

Encodes input data using the trained model’s embedding layer.

Parameters:
  • X (array-like or DataFrame) – Input data to be encoded.

  • batch_size (int, optional, default=64) – Batch size for encoding.

Returns:

Encoded representations of the input data.

Return type:

torch.Tensor

Raises:

ValueError – If the model or data module is not fitted.

evaluate(X, y_true, embeddings=None, metrics=None)

Evaluate the model on the given data using specified metrics.

Parameters:
  • X (array-like or pd.DataFrame of shape (n_samples, n_features)) – The input samples to predict.

  • y_true (array-like of shape (n_samples,)) – The true class labels against which to evaluate the predictions.

  • embneddings (array-like or list of shape(n_samples, dimension)) – List or array with embeddings for unstructured data inputs

  • metrics (dict) – A dictionary where keys are metric names and values are tuples containing the metric function and a boolean indicating whether the metric requires probability scores (True) or class labels (False).

Returns:

scores – A dictionary with metric names as keys and their corresponding scores as values.

Return type:

dict

Notes

This method uses either the predict or predict_proba method depending on the metric requirements.

fit(X, y, val_size=0.2, X_val=None, y_val=None, embeddings=None, embeddings_val=None, max_epochs=100, random_state=101, batch_size=128, shuffle=True, patience=15, monitor='val_loss', mode='min', lr=None, lr_patience=None, lr_factor=None, weight_decay=None, checkpoint_path='model_checkpoints', train_metrics=None, val_metrics=None, dataloader_kwargs={}, rebuild=True, **trainer_kwargs)

Trains the classification model using the provided training data. Optionally, a separate validation set can be used.

Parameters:
  • X (DataFrame or array-like, shape (n_samples, n_features)) – The training input samples.

  • y (array-like, shape (n_samples,) or (n_samples, n_targets)) – The target values (real numbers).

  • val_size (float) – The proportion of the dataset to include in the validation split if X_val is None. Ignored if X_val is provided.

  • X_val (DataFrame or array-like, shape (n_samples, n_features), optional) – The validation input samples. If provided, X and y are not split and this data is used for validation.

  • y_val (array-like, shape (n_samples,) or (n_samples, n_targets), optional) – The validation target values. Required if X_val is provided.

  • max_epochs (int) – Maximum number of epochs for training.

  • random_state (int) – Controls the shuffling applied to the data before applying the split.

  • batch_size (int) – Number of samples per gradient update.

  • shuffle (bool) – Whether to shuffle the training data before each epoch.

  • patience (int) – Number of epochs with no improvement on the validation loss to wait before early stopping.

  • monitor (str) – The metric to monitor for early stopping.

  • mode (str) – Whether the monitored metric should be minimized (min) or maximized (max).

  • lr (float | None) – Learning rate for the optimizer.

  • lr_patience (int | None) – Number of epochs with no improvement on the validation loss to wait before reducing the learning rate.

  • factor (float, default=0.1) – Factor by which the learning rate will be reduced.

  • weight_decay (float | None) – Weight decay (L2 penalty) coefficient.

  • checkpoint_path (str, default="model_checkpoints") – Path where the checkpoints are being saved.

  • train_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during training.

  • val_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during validation.

  • dataloader_kwargs (dict, default={}) – The kwargs for the pytorch dataloader class.

  • rebuild (bool, default=True) – Whether to rebuild the model when it already was built.

  • **trainer_kwargs (Additional keyword arguments for PyTorch Lightning's Trainer class.)

Returns:

self – The fitted classifier.

Return type:

object

get_number_of_params(requires_grad=True)

Calculate the number of parameters in the model.

Parameters:

requires_grad (bool, optional) – If True, only count the parameters that require gradients (trainable parameters). If False, count all parameters. Default is True.

Returns:

The total number of parameters in the model.

Return type:

int

Raises:

ValueError – If the model has not been built prior to calling this method.

get_params(deep=True)

Get parameters for this estimator.

classmethod load(path)

Load and return a fitted model from path.

Parameters:

path (str) – Path to a file previously written by save().

Returns:

A fully reconstructed, ready-to-predict estimator of the same type that was saved.

Return type:

estimator

optimize_hparams(X, y, X_val=None, y_val=None, embeddings=None, embeddings_val=None, time=100, max_epochs=200, prune_by_epoch=True, prune_epoch=5, fixed_params={'cat_encoding': 'int', 'head_layer_size_length': 0, 'head_skip_layer': False, 'head_skip_layers': False, 'pooling_method': 'avg', 'use_cls': False}, custom_search_space=None, **optimize_kwargs)

Optimizes hyperparameters using Bayesian optimization with optional pruning.

Parameters:
  • X (array-like) – Training data.

  • y (array-like) – Training labels.

  • X_val (array-like, optional) – Validation data and labels.

  • y_val (array-like, optional) – Validation data and labels.

  • time (int) – The number of optimization trials to run.

  • max_epochs (int) – Maximum number of epochs for training.

  • prune_by_epoch (bool) – Whether to prune based on a specific epoch (True) or the best validation loss (False).

  • prune_epoch (int) – The specific epoch to prune by when prune_by_epoch is True.

  • **optimize_kwargs (dict) – Additional keyword arguments passed to the fit method.

Returns:

best_hparams – Best hyperparameters found during optimization.

Return type:

list

predict(X, embeddings=None, device=None)

Predicts target labels for the given input samples.

Parameters:

X (DataFrame or array-like, shape (n_samples, n_features)) – The input samples for which to predict target values.

Returns:

predictions – The predicted class labels.

Return type:

ndarray, shape (n_samples,)

predict_proba(X, embeddings=None, device=None)

Predicts class probabilities for the given input samples.

Parameters:

X (DataFrame or array-like, shape (n_samples, n_features)) – The input samples for which to predict class probabilities.

Returns:

probabilities – The predicted class probabilities.

Return type:

ndarray, shape (n_samples, n_classes)

pretrain(pretrain_epochs=15, k_neighbors=10, temperature=0.1, save_path='pretrained_embeddings.pth', lr=0.001, use_positive=True, use_negative=False, pool_sequence=True)

Pretrains the embedding layer of the model using a contrastive learning approach.

This method performs pretraining by optimizing the embeddings with respect to neighborhood structure in the feature space. The embeddings are saved after training.

Parameters:
  • pretrain_epochs (int, default=15) – Number of epochs to run pretraining.

  • k_neighbors (int, default=10) – Number of neighbors used in the contrastive loss computation.

  • temperature (float, default=0.1) – Temperature parameter for contrastive loss scaling.

  • save_path (str, default="pretrained_embeddings.pth") – Path to save the pretrained embeddings.

  • lr (float, default=1e-3) – Learning rate for the pretraining optimizer.

  • use_positive (bool, default=True) – Whether to include positive pairs in contrastive learning.

  • use_negative (bool, default=False) – Whether to include negative pairs in contrastive learning.

  • pool_sequence (bool, default=True) – Whether to apply sequence pooling before computing contrastive loss.

Raises:
  • ValueError – If the model has not been built before calling this method.

  • ValueError – If the model does not contain an embedding layer.

Notes

  • This function requires that self.build_model() has been called beforehand.

  • The pretraining method uses self.task_model.estimator.embedding_layer.

  • The method invokes super()._pretrain() with regression mode enabled.

save(path)

Save the fitted model to path.

The bundle written by this method can be restored with load(). It contains all state required for inference: the config, the fitted preprocessor, feature metadata, and the neural-network weights.

Parameters:

path (str) – Destination file path (e.g. "model.pt").

Raises:

ValueError – If the model has not been fitted yet.

Return type:

None

score(X, y, embeddings=None, metric=(sklearn.metrics.log_loss, True))

Calculate the score of the model using the specified metric.

Parameters:
  • X (array-like or pd.DataFrame of shape (n_samples, n_features)) – The input samples to predict.

  • y (array-like of shape (n_samples,)) – The true class labels against which to evaluate the predictions.

  • metric (tuple, default=(log_loss, True)) – A tuple containing the metric function and a boolean indicating whether the metric requires probability scores (True) or class labels (False).

Returns:

score – The score calculated using the specified metric.

Return type:

float

set_params(**parameters)

Set the parameters of this estimator.

class deeptab.models.MambaTabRegressor(*args: Any, **kwargs: Any)[source]

MambaTab regressor. This class extends the SklearnBaseRegressor class and uses the MambaTab model with the default MambaTab configuration.

Notes

The parameters for this class include the attributes from the config dataclass as well as preprocessing arguments handled by the base class.

Configuration class for the Default MambaTab model with predefined hyperparameters.

Parameters:
  • d_model (int, default=64) – Dimensionality of the model.

  • n_layers (int, default=1) – Number of layers in the model.

  • expand_factor (int, default=2) – Expansion factor for the feed-forward layers.

  • bias (bool, default=False) – Whether to use bias in the linear layers.

  • d_conv (int, default=16) – Dimensionality of the convolutional layers.

  • conv_bias (bool, default=True) – Whether to use bias in the convolutional layers.

  • dropout (float, default=0.05) – Dropout rate for regularization.

  • dt_rank (str, default="auto") – Rank of the decision tree used in the model.

  • d_state (int, default=128) – Dimensionality of the state in recurrent layers.

  • dt_scale (float, default=1.0) – Scaling factor for the decision tree.

  • dt_init (str, default="random") – Initialization method for the decision tree.

  • dt_max (float, default=0.1) – Maximum value for decision tree initialization.

  • dt_min (float, default=1e-04) – Minimum value for decision tree initialization.

  • dt_init_floor (float, default=1e-04) – Floor value for decision tree initialization.

  • activation (callable, default=nn.ReLU()) – Activation function for the model.

  • axis (int, default=1) – Axis along which operations are applied, if applicable.

  • head_layer_sizes (list, default=()) – Sizes of the fully connected layers in the model’s head.

  • head_dropout (float, default=0.0) – Dropout rate for the head layers.

  • head_skip_layers (bool, default=False) – Whether to skip layers in the head.

  • head_activation (callable, default=nn.ReLU()) – Activation function for the head layers.

  • head_use_batch_norm (bool, default=False) – Whether to use batch normalization in the head layers.

  • norm (str, default="LayerNorm") – Type of normalization to be used (‘LayerNorm’, ‘RMSNorm’, etc.).

  • use_pscan (bool, default=False) – Whether to use PSCAN for the state-space model.

  • mamba_version (str, default="mamba-torch") – Version of the Mamba model to use (‘mamba-torch’, ‘mamba1’, ‘mamba2’).

  • bidirectional (bool, default=False) – Whether to process data bidirectionally.

  • feature_preprocessing (dict, optional) – Dictionary mapping feature names to specific preprocessing methods. Overrides global defaults.

  • n_bins (int, default=64) – Number of bins used for binning-based preprocessing (e.g., for discretizers or PLE).

  • numerical_preprocessing (str, default="ple") – Preprocessing method for numerical features (e.g., “standardization”, “minmax”, “ple”, “rbf”, etc.).

  • categorical_preprocessing (str, default="int") – Preprocessing method for categorical features (e.g., “int”, “ordinal”, “onehot”).

  • use_decision_tree_bins (bool, default=False) – Whether to use decision tree binning for numerical discretization.

  • binning_strategy (str, default="uniform") – Strategy for bin placement when not using tree-based methods. Options: “uniform”, “quantile”.

  • task (str, default="regression") – Problem type used to guide preprocessing (e.g., “regression” or “classification”).

  • cat_cutoff (float or int, default=0.03) – Threshold to determine whether integer-valued features are treated as categorical.

  • treat_all_integers_as_numerical (bool, default=False) – If True, treat all integer-typed columns as numerical regardless of cardinality.

  • degree (int, default=3) – Degree of polynomial or spline basis functions where applicable.

  • scaling_strategy (str, default="minmax") – Strategy for feature scaling (e.g., “standardization”, “minmax”, etc.).

  • n_knots (int, default=64) – Number of knots used in spline-based feature expansions.

  • use_decision_tree_knots (bool, default=True) – Whether to use decision tree-based knot placement for spline transformations.

  • knots_strategy (str, default="uniform") – Strategy for placing knots for splines (“uniform” or “quantile”).

  • spline_implementation (str, default="sklearn") – Which spline backend implementation to use (e.g., “sklearn”, “custom”).

  • min_unique_vals (int, default=5) – Minimum number of unique values required for a feature to be treated as numerical.

Examples

>>> from deeptab.models import MambaTabRegressor
>>> model = MambaTabRegressor(d_model=64, n_layers=2)
>>> model.fit(X_train, y_train)
>>> preds = model.predict(X_test)
>>> model.evaluate(X_test, y_test)
build_model(X, y, val_size=0.2, X_val=None, y_val=None, embeddings=None, embeddings_val=None, random_state=101, batch_size=128, shuffle=True, lr=None, lr_patience=None, lr_factor=None, weight_decay=None, train_metrics=None, val_metrics=None, dataloader_kwargs={})

Builds the model using the provided training data.

Parameters:
  • X (DataFrame or array-like, shape (n_samples, n_features)) – The training input samples.

  • y (array-like, shape (n_samples,) or (n_samples, n_targets)) – The target values (real numbers).

  • val_size (float) – The proportion of the dataset to include in the validation split if X_val is None. Ignored if X_val is provided.

  • X_val (DataFrame or array-like, shape (n_samples, n_features), optional) – The validation input samples. If provided, X and y are not split and this data is used for validation.

  • y_val (array-like, shape (n_samples,) or (n_samples, n_targets), optional) – The validation target values. Required if X_val is provided.

  • random_state (int) – Controls the shuffling applied to the data before applying the split.

  • batch_size (int) – Number of samples per gradient update.

  • shuffle (bool) – Whether to shuffle the training data before each epoch.

  • lr (float | None) – Learning rate for the optimizer.

  • lr_patience (int | None) – Number of epochs with no improvement on the validation loss to wait before reducing the learning rate.

  • factor (float, default=0.1) – Factor by which the learning rate will be reduced.

  • weight_decay (float | None) – Weight decay (L2 penalty) coefficient.

  • train_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during training.

  • val_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during validation.

  • dataloader_kwargs (dict, default={}) – The kwargs for the pytorch dataloader class.

Returns:

self – The built regressor.

Return type:

object

encode(X, embeddings=None, batch_size=64)

Encodes input data using the trained model’s embedding layer.

Parameters:
  • X (array-like or DataFrame) – Input data to be encoded.

  • batch_size (int, optional, default=64) – Batch size for encoding.

Returns:

Encoded representations of the input data.

Return type:

torch.Tensor

Raises:

ValueError – If the model or data module is not fitted.

evaluate(X, y_true, embeddings=None, metrics=None)

Evaluate the model on the given data using specified metrics.

Parameters:
  • X (array-like or pd.DataFrame of shape (n_samples, n_features)) – The input samples to predict.

  • y_true (array-like of shape (n_samples,) or (n_samples, n_outputs)) – The true target values against which to evaluate the predictions.

  • metrics (dict) – A dictionary where keys are metric names and values are the metric functions.

Notes

This method uses the predict method to generate predictions and computes each metric.

Returns:

scores – A dictionary with metric names as keys and their corresponding scores as values.

Return type:

dict

fit(X, y, val_size=0.2, X_val=None, y_val=None, embeddings=None, embeddings_val=None, max_epochs=100, random_state=101, batch_size=128, shuffle=True, patience=15, monitor='val_loss', mode='min', lr=None, lr_patience=None, lr_factor=None, weight_decay=None, checkpoint_path='model_checkpoints', dataloader_kwargs={}, train_metrics=None, val_metrics=None, rebuild=True, **trainer_kwargs)

Trains the regression model using the provided training data. Optionally, a separate validation set can be used.

Parameters:
  • X (DataFrame or array-like, shape (n_samples, n_features)) – The training input samples.

  • y (array-like, shape (n_samples,) or (n_samples, n_targets)) – The target values (real numbers).

  • val_size (float) – The proportion of the dataset to include in the validation split if X_val is None. Ignored if X_val is provided.

  • X_val (DataFrame or array-like, shape (n_samples, n_features), optional) – The validation input samples. If provided, X and y are not split and this data is used for validation.

  • y_val (array-like, shape (n_samples,) or (n_samples, n_targets), optional) – The validation target values. Required if X_val is provided.

  • max_epochs (int) – Maximum number of epochs for training.

  • random_state (int) – Controls the shuffling applied to the data before applying the split.

  • batch_size (int) – Number of samples per gradient update.

  • shuffle (bool) – Whether to shuffle the training data before each epoch.

  • patience (int) – Number of epochs with no improvement on the validation loss to wait before early stopping.

  • monitor (str) – The metric to monitor for early stopping.

  • mode (str) – Whether the monitored metric should be minimized (min) or maximized (max).

  • lr (float | None) – Learning rate for the optimizer.

  • lr_patience (int | None) – Number of epochs with no improvement on the validation loss to wait before reducing the learning rate.

  • factor (float, default=0.1) – Factor by which the learning rate will be reduced.

  • weight_decay (float | None) – Weight decay (L2 penalty) coefficient.

  • checkpoint_path (str, default="model_checkpoints") – Path where the checkpoints are being saved.

  • dataloader_kwargs (dict, default={}) – The kwargs for the pytorch dataloader class.

  • train_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during training.

  • val_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during validation.

  • rebuild (bool, default=True) – Whether to rebuild the model when it already was built.

  • **trainer_kwargs (Additional keyword arguments for PyTorch Lightning's Trainer class.)

Returns:

self – The fitted regressor.

Return type:

object

get_number_of_params(requires_grad=True)

Calculate the number of parameters in the model.

Parameters:

requires_grad (bool, optional) – If True, only count the parameters that require gradients (trainable parameters). If False, count all parameters. Default is True.

Returns:

The total number of parameters in the model.

Return type:

int

Raises:

ValueError – If the model has not been built prior to calling this method.

get_params(deep=True)

Get parameters for this estimator.

classmethod load(path)

Load and return a fitted model from path.

Parameters:

path (str) – Path to a file previously written by save().

Returns:

A fully reconstructed, ready-to-predict estimator of the same type that was saved.

Return type:

estimator

optimize_hparams(X, y, X_val=None, y_val=None, embeddings=None, embeddings_val=None, time=100, max_epochs=200, prune_by_epoch=True, prune_epoch=5, fixed_params={'cat_encoding': 'int', 'head_layer_size_length': 0, 'head_skip_layer': False, 'head_skip_layers': False, 'pooling_method': 'avg', 'use_cls': False}, custom_search_space=None, **optimize_kwargs)

Optimizes hyperparameters using Bayesian optimization with optional pruning.

Parameters:
  • X (array-like) – Training data.

  • y (array-like) – Training labels.

  • X_val (array-like, optional) – Validation data and labels.

  • y_val (array-like, optional) – Validation data and labels.

  • time (int) – The number of optimization trials to run.

  • max_epochs (int) – Maximum number of epochs for training.

  • prune_by_epoch (bool) – Whether to prune based on a specific epoch (True) or the best validation loss (False).

  • prune_epoch (int) – The specific epoch to prune by when prune_by_epoch is True.

  • **optimize_kwargs (dict) – Additional keyword arguments passed to the fit method.

Returns:

best_hparams – Best hyperparameters found during optimization.

Return type:

list

predict(X, embeddings=None, device=None)

Predicts target values for the given input samples.

Parameters:

X (DataFrame or array-like, shape (n_samples, n_features)) – The input samples for which to predict target values.

Returns:

predictions – The predicted target values.

Return type:

ndarray, shape (n_samples,) or (n_samples, n_outputs)

pretrain(pretrain_epochs=15, k_neighbors=10, temperature=0.1, save_path='pretrained_embeddings.pth', lr=0.001, use_positive=True, use_negative=False, pool_sequence=True)

Pretrains the embedding layer of the model using a contrastive learning approach.

This method performs pretraining by optimizing the embeddings with respect to neighborhood structure in the feature space. The embeddings are saved after training.

Parameters:
  • pretrain_epochs (int, default=15) – Number of epochs to run pretraining.

  • k_neighbors (int, default=10) – Number of neighbors used in the contrastive loss computation.

  • temperature (float, default=0.1) – Temperature parameter for contrastive loss scaling.

  • save_path (str, default="pretrained_embeddings.pth") – Path to save the pretrained embeddings.

  • lr (float, default=1e-3) – Learning rate for the pretraining optimizer.

  • use_positive (bool, default=True) – Whether to include positive pairs in contrastive learning.

  • use_negative (bool, default=False) – Whether to include negative pairs in contrastive learning.

  • pool_sequence (bool, default=True) – Whether to apply sequence pooling before computing contrastive loss.

Raises:
  • ValueError – If the model has not been built before calling this method.

  • ValueError – If the model does not contain an embedding layer.

Notes

  • This function requires that self.build_model() has been called beforehand.

  • The pretraining method uses self.task_model.estimator.embedding_layer.

  • The method invokes super()._pretrain() with regression mode enabled.

save(path)

Save the fitted model to path.

The bundle written by this method can be restored with load(). It contains all state required for inference: the config, the fitted preprocessor, feature metadata, and the neural-network weights.

Parameters:

path (str) – Destination file path (e.g. "model.pt").

Raises:

ValueError – If the model has not been fitted yet.

Return type:

None

score(X, y, embeddings=None, metric=sklearn.metrics.mean_squared_error)

Calculate the score of the model using the specified metric.

Parameters:
  • X (array-like or pd.DataFrame of shape (n_samples, n_features)) – The input samples to predict.

  • y (array-like of shape (n_samples,) or (n_samples, n_outputs)) – The true target values against which to evaluate the predictions.

  • metric (callable, default=mean_squared_error) – The metric function to use for evaluation. Must be a callable with the signature metric(y_true, y_pred).

Returns:

score – The score calculated using the specified metric.

Return type:

float

set_params(**parameters)

Set the parameters of this estimator.

class deeptab.models.MambaTabLSS(*args: Any, **kwargs: Any)[source]

MambaTab LSS for distributional regression. This class extends the SklearnBaseLSS class and uses the MambaTab model with the default MambaTab configuration.

Notes

The parameters for this class include the attributes from the config dataclass as well as preprocessing arguments handled by the base class.

Configuration class for the Default MambaTab model with predefined hyperparameters.

Parameters:
  • d_model (int, default=64) – Dimensionality of the model.

  • n_layers (int, default=1) – Number of layers in the model.

  • expand_factor (int, default=2) – Expansion factor for the feed-forward layers.

  • bias (bool, default=False) – Whether to use bias in the linear layers.

  • d_conv (int, default=16) – Dimensionality of the convolutional layers.

  • conv_bias (bool, default=True) – Whether to use bias in the convolutional layers.

  • dropout (float, default=0.05) – Dropout rate for regularization.

  • dt_rank (str, default="auto") – Rank of the decision tree used in the model.

  • d_state (int, default=128) – Dimensionality of the state in recurrent layers.

  • dt_scale (float, default=1.0) – Scaling factor for the decision tree.

  • dt_init (str, default="random") – Initialization method for the decision tree.

  • dt_max (float, default=0.1) – Maximum value for decision tree initialization.

  • dt_min (float, default=1e-04) – Minimum value for decision tree initialization.

  • dt_init_floor (float, default=1e-04) – Floor value for decision tree initialization.

  • activation (callable, default=nn.ReLU()) – Activation function for the model.

  • axis (int, default=1) – Axis along which operations are applied, if applicable.

  • head_layer_sizes (list, default=()) – Sizes of the fully connected layers in the model’s head.

  • head_dropout (float, default=0.0) – Dropout rate for the head layers.

  • head_skip_layers (bool, default=False) – Whether to skip layers in the head.

  • head_activation (callable, default=nn.ReLU()) – Activation function for the head layers.

  • head_use_batch_norm (bool, default=False) – Whether to use batch normalization in the head layers.

  • norm (str, default="LayerNorm") – Type of normalization to be used (‘LayerNorm’, ‘RMSNorm’, etc.).

  • use_pscan (bool, default=False) – Whether to use PSCAN for the state-space model.

  • mamba_version (str, default="mamba-torch") – Version of the Mamba model to use (‘mamba-torch’, ‘mamba1’, ‘mamba2’).

  • bidirectional (bool, default=False) – Whether to process data bidirectionally.

  • feature_preprocessing (dict, optional) – Dictionary mapping feature names to specific preprocessing methods. Overrides global defaults.

  • n_bins (int, default=64) – Number of bins used for binning-based preprocessing (e.g., for discretizers or PLE).

  • numerical_preprocessing (str, default="ple") – Preprocessing method for numerical features (e.g., “standardization”, “minmax”, “ple”, “rbf”, etc.).

  • categorical_preprocessing (str, default="int") – Preprocessing method for categorical features (e.g., “int”, “ordinal”, “onehot”).

  • use_decision_tree_bins (bool, default=False) – Whether to use decision tree binning for numerical discretization.

  • binning_strategy (str, default="uniform") – Strategy for bin placement when not using tree-based methods. Options: “uniform”, “quantile”.

  • task (str, default="regression") – Problem type used to guide preprocessing (e.g., “regression” or “classification”).

  • cat_cutoff (float or int, default=0.03) – Threshold to determine whether integer-valued features are treated as categorical.

  • treat_all_integers_as_numerical (bool, default=False) – If True, treat all integer-typed columns as numerical regardless of cardinality.

  • degree (int, default=3) – Degree of polynomial or spline basis functions where applicable.

  • scaling_strategy (str, default="minmax") – Strategy for feature scaling (e.g., “standardization”, “minmax”, etc.).

  • n_knots (int, default=64) – Number of knots used in spline-based feature expansions.

  • use_decision_tree_knots (bool, default=True) – Whether to use decision tree-based knot placement for spline transformations.

  • knots_strategy (str, default="uniform") – Strategy for placing knots for splines (“uniform” or “quantile”).

  • spline_implementation (str, default="sklearn") – Which spline backend implementation to use (e.g., “sklearn”, “custom”).

  • min_unique_vals (int, default=5) – Minimum number of unique values required for a feature to be treated as numerical.

Examples

>>> from deeptab.models import MambaTabLSS
>>> model = MambaTabLSS(d_model=64, n_layers=2)
>>> model.fit(X_train, y_train, family='normal')
>>> preds = model.predict(X_test)
>>> model.evaluate(X_test, y_test)
build_model(X, y, val_size=0.2, X_val=None, y_val=None, random_state=101, batch_size=128, shuffle=True, lr=None, lr_patience=None, lr_factor=None, weight_decay=None, train_metrics=None, val_metrics=None, dataloader_kwargs={})

Builds the model using the provided training data.

Parameters:
  • X (DataFrame or array-like, shape (n_samples, n_features)) – The training input samples.

  • y (array-like, shape (n_samples,) or (n_samples, n_targets)) – The target values (real numbers).

  • val_size (float) – The proportion of the dataset to include in the validation split if X_val is None. Ignored if X_val is provided.

  • X_val (DataFrame or array-like, shape (n_samples, n_features), optional) – The validation input samples. If provided, X and y are not split and this data is used for validation.

  • y_val (array-like, shape (n_samples,) or (n_samples, n_targets), optional) – The validation target values. Required if X_val is provided.

  • random_state (int) – Controls the shuffling applied to the data before applying the split.

  • batch_size (int) – Number of samples per gradient update.

  • shuffle (bool) – Whether to shuffle the training data before each epoch.

  • lr (float | None) – Learning rate for the optimizer.

  • lr_patience (int | None) – Number of epochs with no improvement on the validation loss to wait before reducing the learning rate.

  • lr_factor (float | None) – Factor by which the learning rate will be reduced.

  • train_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during training.

  • val_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during validation.

  • weight_decay (float | None) – Weight decay (L2 penalty) coefficient.

  • dataloader_kwargs (dict, default={}) – The kwargs for the pytorch dataloader class.

Returns:

self – The built distributional regressor.

Return type:

object

encode(X, batch_size=64)

Encodes input data using the trained model’s embedding layer.

Parameters:
  • X (array-like or DataFrame) – Input data to be encoded.

  • batch_size (int, optional, default=64) – Batch size for encoding.

Returns:

Encoded representations of the input data.

Return type:

torch.Tensor

Raises:

ValueError – If the model or data module is not fitted.

evaluate(X, y_true, metrics=None, distribution_family=None)

Evaluate the model on the given data using specified metrics.

Parameters:
  • X (array-like or pd.DataFrame of shape (n_samples, n_features)) – The input samples to predict.

  • y_true (array-like of shape (n_samples,)) – The true class labels against which to evaluate the predictions.

  • metrics (dict) – A dictionary where keys are metric names and values are tuples containing the metric function and a boolean indicating whether the metric requires probability scores (True) or class labels (False).

  • distribution_family (str, optional) – Specifies the distribution family the model is predicting for. If None, it will attempt to infer based on the model’s settings.

Returns:

scores – A dictionary with metric names as keys and their corresponding scores as values.

Return type:

dict

Notes

This method uses either the predict or predict_proba method depending on the metric requirements.

fit(X, y, family, val_size=0.2, X_val=None, y_val=None, max_epochs=100, random_state=101, batch_size=128, shuffle=True, patience=15, monitor='val_loss', mode='min', lr=None, lr_patience=None, lr_factor=None, weight_decay=None, checkpoint_path='model_checkpoints', distributional_kwargs=None, train_metrics=None, val_metrics=None, dataloader_kwargs={}, rebuild=True, **trainer_kwargs)

Trains the regression model using the provided training data. Optionally, a separate validation set can be used.

Parameters:
  • X (DataFrame or array-like, shape (n_samples, n_features)) – The training input samples.

  • y (array-like, shape (n_samples,) or (n_samples, n_targets)) – The target values (real numbers).

  • family (str) – The name of the distribution family to use for the loss function. Examples include ‘normal’ for regression tasks.

  • val_size (float) – The proportion of the dataset to include in the validation split if X_val is None. Ignored if X_val is provided.

  • X_val (DataFrame or array-like, shape (n_samples, n_features), optional) – The validation input samples. If provided, X and y are not split and this data is used for validation.

  • y_val (array-like, shape (n_samples,) or (n_samples, n_targets), optional) – The validation target values. Required if X_val is provided.

  • max_epochs (int) – Maximum number of epochs for training.

  • random_state (int) – Controls the shuffling applied to the data before applying the split.

  • batch_size (int) – Number of samples per gradient update.

  • shuffle (bool) – Whether to shuffle the training data before each epoch.

  • patience (int) – Number of epochs with no improvement on the validation loss to wait before early stopping.

  • monitor (str) – The metric to monitor for early stopping.

  • mode (str) – Whether the monitored metric should be minimized (min) or maximized (max).

  • lr (float | None) – Learning rate for the optimizer.

  • lr_patience (int | None) – Number of epochs with no improvement on the validation loss to wait before reducing the learning rate.

  • factor (float, default=0.1) – Factor by which the learning rate will be reduced.

  • weight_decay (float | None) – Weight decay (L2 penalty) coefficient.

  • distributional_kwargs (dict, default=None) – any arguments taht are specific for a certain distribution.

  • train_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during training.

  • val_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during validation.

  • checkpoint_path (str, default="model_checkpoints") – Path where the checkpoints are being saved.

  • dataloader_kwargs (dict, default={}) – The kwargs for the pytorch dataloader class.

  • **trainer_kwargs (Additional keyword arguments for PyTorch Lightning's Trainer class.)

Returns:

self – The fitted regressor.

Return type:

object

get_default_metrics(distribution_family)

Provides default metrics based on the distribution family.

Parameters:

distribution_family (str) – The distribution family for which to provide default metrics.

Returns:

metrics – A dictionary of default metric functions.

Return type:

dict

get_number_of_params(requires_grad=True)

Calculate the number of parameters in the model.

Parameters:

requires_grad (bool, optional) – If True, only count the parameters that require gradients (trainable parameters). If False, count all parameters. Default is True.

Returns:

The total number of parameters in the model.

Return type:

int

Raises:

ValueError – If the model has not been built prior to calling this method.

get_params(deep=True)

Get parameters for this estimator.

Parameters:

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

params – Parameter names mapped to their values.

Return type:

dict

classmethod load(path)

Load and return a fitted model from path.

Parameters:

path (str) – Path to a file previously written by save().

Returns:

A fully reconstructed, ready-to-predict estimator.

Return type:

estimator

optimize_hparams(X, y, X_val=None, y_val=None, time=100, max_epochs=200, prune_by_epoch=True, prune_epoch=5, fixed_params={'cat_encoding': 'int', 'head_layer_size_length': 0, 'head_skip_layer': False, 'head_skip_layers': False, 'pooling_method': 'avg', 'use_cls': False}, custom_search_space=None, **optimize_kwargs)

Optimizes hyperparameters using Bayesian optimization with optional pruning.

Parameters:
  • X (array-like) – Training data.

  • y (array-like) – Training labels.

  • X_val (array-like, optional) – Validation data and labels.

  • y_val (array-like, optional) – Validation data and labels.

  • time (int) – The number of optimization trials to run.

  • max_epochs (int) – Maximum number of epochs for training.

  • prune_by_epoch (bool) – Whether to prune based on a specific epoch (True) or the best validation loss (False).

  • prune_epoch (int) – The specific epoch to prune by when prune_by_epoch is True.

  • **optimize_kwargs (dict) – Additional keyword arguments passed to the fit method.

Returns:

best_hparams – Best hyperparameters found during optimization.

Return type:

list

predict(X, raw=False, device=None)

Predicts target values for the given input samples.

Parameters:

X (DataFrame or array-like, shape (n_samples, n_features)) – The input samples for which to predict target values.

Returns:

predictions – The predicted target values.

Return type:

ndarray, shape (n_samples,) or (n_samples, n_outputs)

save(path)

Save the fitted model to path.

Parameters:

path (str) – Destination file path (e.g. "model.pt").

Raises:

ValueError – If the model has not been fitted yet.

Return type:

None

score(X, y, metric='NLL')

Calculate the score of the model using the specified metric.

Parameters:
  • X (array-like or pd.DataFrame of shape (n_samples, n_features)) – The input samples to predict.

  • y (array-like of shape (n_samples,) or (n_samples, n_outputs)) – The true target values against which to evaluate the predictions.

  • metric (str, default="NLL") – So far, only negative log-likelihood is supported

Returns:

score – The score calculated using the specified metric.

Return type:

float

set_params(**parameters)

Set the parameters of this estimator.

Parameters:

**parameters (dict) – Estimator parameters.

Returns:

self – Estimator instance.

Return type:

object

class deeptab.models.MambAttentionClassifier(*args: Any, **kwargs: Any)[source]

MambAttention classifier. This class extends the SklearnBaseClassifier class and uses the MambAttention model with the default MambAttention configuration.

Notes

The parameters for this class include the attributes from the config dataclass as well as preprocessing arguments handled by the base class.

Configuration class for the Default Mambular Attention model with predefined hyperparameters.

Parameters:
  • d_model (int, default=64) – Dimensionality of the model.

  • n_layers (int, default=4) – Number of layers in the model.

  • expand_factor (int, default=2) – Expansion factor for the feed-forward layers.

  • n_heads (int, default=8) – Number of attention heads in the model.

  • last_layer (str, default="attn") – Type of the last layer (e.g., ‘attn’).

  • n_mamba_per_attention (int, default=1) – Number of Mamba blocks per attention layer.

  • bias (bool, default=False) – Whether to use bias in the linear layers.

  • d_conv (int, default=4) – Dimensionality of the convolutional layers.

  • conv_bias (bool, default=True) – Whether to use bias in the convolutional layers.

  • dropout (float, default=0.0) – Dropout rate for regularization.

  • attn_dropout (float, default=0.2) – Dropout rate for the attention mechanism.

  • dt_rank (str, default="auto") – Rank of the decision tree.

  • d_state (int, default=128) – Dimensionality of the state in recurrent layers.

  • dt_scale (float, default=1.0) – Scaling factor for the decision tree.

  • dt_init (str, default="random") – Initialization method for the decision tree.

  • dt_max (float, default=0.1) – Maximum value for decision tree initialization.

  • dt_min (float, default=1e-04) – Minimum value for decision tree initialization.

  • dt_init_floor (float, default=1e-04) – Floor value for decision tree initialization.

  • norm (str, default="LayerNorm") – Type of normalization used in the model.

  • activation (callable, default=nn.SiLU()) – Activation function for the model.

  • head_layer_sizes (list, default=()) – Sizes of the fully connected layers in the model’s head.

  • head_dropout (float, default=0.5) – Dropout rate for the head layers.

  • head_skip_layers (bool, default=False) – Whether to use skip connections in the head layers.

  • head_activation (callable, default=nn.SELU()) – Activation function for the head layers.

  • head_use_batch_norm (bool, default=False) – Whether to use batch normalization in the head layers.

  • pooling_method (str, default="avg") – Pooling method to be used (‘avg’, ‘max’, etc.).

  • bidirectional (bool, default=False) – Whether to process input sequences bidirectionally.

  • use_learnable_interaction (bool, default=False) – Whether to use learnable feature interactions before passing through Mamba blocks.

  • use_cls (bool, default=False) – Whether to append a CLS token for sequence pooling.

  • shuffle_embeddings (bool, default=False) – Whether to shuffle embeddings before passing to Mamba layers.

  • cat_encoding (str, default="int") – Encoding method for categorical features (‘int’, ‘one-hot’, etc.).

  • AD_weight_decay (bool, default=True) – Whether weight decay is applied to A-D matrices.

  • BC_layer_norm (bool, default=False) – Whether to apply layer normalization to B-C matrices.

  • use_pscan (bool, default=False) – Whether to use PSCAN for the state-space model.

  • n_attention_layers (int, default=1) – Number of attention layers in the model.

  • feature_preprocessing (dict, optional) – Dictionary mapping feature names to specific preprocessing methods. Overrides global defaults.

  • n_bins (int, default=64) – Number of bins used for binning-based preprocessing (e.g., for discretizers or PLE).

  • numerical_preprocessing (str, default="ple") – Preprocessing method for numerical features (e.g., “standardization”, “minmax”, “ple”, “rbf”, etc.).

  • categorical_preprocessing (str, default="int") – Preprocessing method for categorical features (e.g., “int”, “ordinal”, “onehot”).

  • use_decision_tree_bins (bool, default=False) – Whether to use decision tree binning for numerical discretization.

  • binning_strategy (str, default="uniform") – Strategy for bin placement when not using tree-based methods. Options: “uniform”, “quantile”.

  • task (str, default="regression") – Problem type used to guide preprocessing (e.g., “regression” or “classification”).

  • cat_cutoff (float or int, default=0.03) – Threshold to determine whether integer-valued features are treated as categorical.

  • treat_all_integers_as_numerical (bool, default=False) – If True, treat all integer-typed columns as numerical regardless of cardinality.

  • degree (int, default=3) – Degree of polynomial or spline basis functions where applicable.

  • scaling_strategy (str, default="minmax") – Strategy for feature scaling (e.g., “standardization”, “minmax”, etc.).

  • n_knots (int, default=64) – Number of knots used in spline-based feature expansions.

  • use_decision_tree_knots (bool, default=True) – Whether to use decision tree-based knot placement for spline transformations.

  • knots_strategy (str, default="uniform") – Strategy for placing knots for splines (“uniform” or “quantile”).

  • spline_implementation (str, default="sklearn") – Which spline backend implementation to use (e.g., “sklearn”, “custom”).

  • min_unique_vals (int, default=5) – Minimum number of unique values required for a feature to be treated as numerical.

Examples

>>> from MambAttention.models import MambAttentionClassifier
>>> model = MambAttentionClassifier(d_model=64, n_layers=8)
>>> model.fit(X_train, y_train)
>>> preds = model.predict(X_test)
>>> model.evaluate(X_test, y_test)
build_model(X, y, val_size=0.2, X_val=None, y_val=None, embeddings=None, embeddings_val=None, random_state=101, batch_size=128, shuffle=True, lr=None, lr_patience=None, lr_factor=None, weight_decay=None, train_metrics=None, val_metrics=None, dataloader_kwargs={})

Builds the model using the provided training data.

Parameters:
  • X (DataFrame or array-like, shape (n_samples, n_features)) – The training input samples.

  • y (array-like, shape (n_samples,) or (n_samples, n_targets)) – The target values (real numbers).

  • val_size (float) – The proportion of the dataset to include in the validation split if X_val is None. Ignored if X_val is provided.

  • X_val (DataFrame or array-like, shape (n_samples, n_features), optional) – The validation input samples. If provided, X and y are not split and this data is used for validation.

  • y_val (array-like, shape (n_samples,) or (n_samples, n_targets), optional) – The validation target values. Required if X_val is provided.

  • random_state (int) – Controls the shuffling applied to the data before applying the split.

  • batch_size (int) – Number of samples per gradient update.

  • shuffle (bool) – Whether to shuffle the training data before each epoch.

  • lr (float | None) – Learning rate for the optimizer.

  • lr_patience (int | None) – Number of epochs with no improvement on the validation loss to wait before reducing the learning rate.

  • lr_factor (float | None) – Factor by which the learning rate will be reduced.

  • train_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during training.

  • val_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during validation.

  • weight_decay (float | None) – Weight decay (L2 penalty) coefficient.

  • dataloader_kwargs (dict, default={}) – The kwargs for the pytorch dataloader class.

Returns:

self – The built classifier.

Return type:

object

encode(X, embeddings=None, batch_size=64)

Encodes input data using the trained model’s embedding layer.

Parameters:
  • X (array-like or DataFrame) – Input data to be encoded.

  • batch_size (int, optional, default=64) – Batch size for encoding.

Returns:

Encoded representations of the input data.

Return type:

torch.Tensor

Raises:

ValueError – If the model or data module is not fitted.

evaluate(X, y_true, embeddings=None, metrics=None)

Evaluate the model on the given data using specified metrics.

Parameters:
  • X (array-like or pd.DataFrame of shape (n_samples, n_features)) – The input samples to predict.

  • y_true (array-like of shape (n_samples,)) – The true class labels against which to evaluate the predictions.

  • embneddings (array-like or list of shape(n_samples, dimension)) – List or array with embeddings for unstructured data inputs

  • metrics (dict) – A dictionary where keys are metric names and values are tuples containing the metric function and a boolean indicating whether the metric requires probability scores (True) or class labels (False).

Returns:

scores – A dictionary with metric names as keys and their corresponding scores as values.

Return type:

dict

Notes

This method uses either the predict or predict_proba method depending on the metric requirements.

fit(X, y, val_size=0.2, X_val=None, y_val=None, embeddings=None, embeddings_val=None, max_epochs=100, random_state=101, batch_size=128, shuffle=True, patience=15, monitor='val_loss', mode='min', lr=None, lr_patience=None, lr_factor=None, weight_decay=None, checkpoint_path='model_checkpoints', train_metrics=None, val_metrics=None, dataloader_kwargs={}, rebuild=True, **trainer_kwargs)

Trains the classification model using the provided training data. Optionally, a separate validation set can be used.

Parameters:
  • X (DataFrame or array-like, shape (n_samples, n_features)) – The training input samples.

  • y (array-like, shape (n_samples,) or (n_samples, n_targets)) – The target values (real numbers).

  • val_size (float) – The proportion of the dataset to include in the validation split if X_val is None. Ignored if X_val is provided.

  • X_val (DataFrame or array-like, shape (n_samples, n_features), optional) – The validation input samples. If provided, X and y are not split and this data is used for validation.

  • y_val (array-like, shape (n_samples,) or (n_samples, n_targets), optional) – The validation target values. Required if X_val is provided.

  • max_epochs (int) – Maximum number of epochs for training.

  • random_state (int) – Controls the shuffling applied to the data before applying the split.

  • batch_size (int) – Number of samples per gradient update.

  • shuffle (bool) – Whether to shuffle the training data before each epoch.

  • patience (int) – Number of epochs with no improvement on the validation loss to wait before early stopping.

  • monitor (str) – The metric to monitor for early stopping.

  • mode (str) – Whether the monitored metric should be minimized (min) or maximized (max).

  • lr (float | None) – Learning rate for the optimizer.

  • lr_patience (int | None) – Number of epochs with no improvement on the validation loss to wait before reducing the learning rate.

  • factor (float, default=0.1) – Factor by which the learning rate will be reduced.

  • weight_decay (float | None) – Weight decay (L2 penalty) coefficient.

  • checkpoint_path (str, default="model_checkpoints") – Path where the checkpoints are being saved.

  • train_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during training.

  • val_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during validation.

  • dataloader_kwargs (dict, default={}) – The kwargs for the pytorch dataloader class.

  • rebuild (bool, default=True) – Whether to rebuild the model when it already was built.

  • **trainer_kwargs (Additional keyword arguments for PyTorch Lightning's Trainer class.)

Returns:

self – The fitted classifier.

Return type:

object

get_number_of_params(requires_grad=True)

Calculate the number of parameters in the model.

Parameters:

requires_grad (bool, optional) – If True, only count the parameters that require gradients (trainable parameters). If False, count all parameters. Default is True.

Returns:

The total number of parameters in the model.

Return type:

int

Raises:

ValueError – If the model has not been built prior to calling this method.

get_params(deep=True)

Get parameters for this estimator.

classmethod load(path)

Load and return a fitted model from path.

Parameters:

path (str) – Path to a file previously written by save().

Returns:

A fully reconstructed, ready-to-predict estimator of the same type that was saved.

Return type:

estimator

optimize_hparams(X, y, X_val=None, y_val=None, embeddings=None, embeddings_val=None, time=100, max_epochs=200, prune_by_epoch=True, prune_epoch=5, fixed_params={'cat_encoding': 'int', 'head_layer_size_length': 0, 'head_skip_layer': False, 'head_skip_layers': False, 'pooling_method': 'avg', 'use_cls': False}, custom_search_space=None, **optimize_kwargs)

Optimizes hyperparameters using Bayesian optimization with optional pruning.

Parameters:
  • X (array-like) – Training data.

  • y (array-like) – Training labels.

  • X_val (array-like, optional) – Validation data and labels.

  • y_val (array-like, optional) – Validation data and labels.

  • time (int) – The number of optimization trials to run.

  • max_epochs (int) – Maximum number of epochs for training.

  • prune_by_epoch (bool) – Whether to prune based on a specific epoch (True) or the best validation loss (False).

  • prune_epoch (int) – The specific epoch to prune by when prune_by_epoch is True.

  • **optimize_kwargs (dict) – Additional keyword arguments passed to the fit method.

Returns:

best_hparams – Best hyperparameters found during optimization.

Return type:

list

predict(X, embeddings=None, device=None)

Predicts target labels for the given input samples.

Parameters:

X (DataFrame or array-like, shape (n_samples, n_features)) – The input samples for which to predict target values.

Returns:

predictions – The predicted class labels.

Return type:

ndarray, shape (n_samples,)

predict_proba(X, embeddings=None, device=None)

Predicts class probabilities for the given input samples.

Parameters:

X (DataFrame or array-like, shape (n_samples, n_features)) – The input samples for which to predict class probabilities.

Returns:

probabilities – The predicted class probabilities.

Return type:

ndarray, shape (n_samples, n_classes)

pretrain(pretrain_epochs=15, k_neighbors=10, temperature=0.1, save_path='pretrained_embeddings.pth', lr=0.001, use_positive=True, use_negative=False, pool_sequence=True)

Pretrains the embedding layer of the model using a contrastive learning approach.

This method performs pretraining by optimizing the embeddings with respect to neighborhood structure in the feature space. The embeddings are saved after training.

Parameters:
  • pretrain_epochs (int, default=15) – Number of epochs to run pretraining.

  • k_neighbors (int, default=10) – Number of neighbors used in the contrastive loss computation.

  • temperature (float, default=0.1) – Temperature parameter for contrastive loss scaling.

  • save_path (str, default="pretrained_embeddings.pth") – Path to save the pretrained embeddings.

  • lr (float, default=1e-3) – Learning rate for the pretraining optimizer.

  • use_positive (bool, default=True) – Whether to include positive pairs in contrastive learning.

  • use_negative (bool, default=False) – Whether to include negative pairs in contrastive learning.

  • pool_sequence (bool, default=True) – Whether to apply sequence pooling before computing contrastive loss.

Raises:
  • ValueError – If the model has not been built before calling this method.

  • ValueError – If the model does not contain an embedding layer.

Notes

  • This function requires that self.build_model() has been called beforehand.

  • The pretraining method uses self.task_model.estimator.embedding_layer.

  • The method invokes super()._pretrain() with regression mode enabled.

save(path)

Save the fitted model to path.

The bundle written by this method can be restored with load(). It contains all state required for inference: the config, the fitted preprocessor, feature metadata, and the neural-network weights.

Parameters:

path (str) – Destination file path (e.g. "model.pt").

Raises:

ValueError – If the model has not been fitted yet.

Return type:

None

score(X, y, embeddings=None, metric=(sklearn.metrics.log_loss, True))

Calculate the score of the model using the specified metric.

Parameters:
  • X (array-like or pd.DataFrame of shape (n_samples, n_features)) – The input samples to predict.

  • y (array-like of shape (n_samples,)) – The true class labels against which to evaluate the predictions.

  • metric (tuple, default=(log_loss, True)) – A tuple containing the metric function and a boolean indicating whether the metric requires probability scores (True) or class labels (False).

Returns:

score – The score calculated using the specified metric.

Return type:

float

set_params(**parameters)

Set the parameters of this estimator.

class deeptab.models.MambAttentionRegressor(*args: Any, **kwargs: Any)[source]

MambAttention regressor. This class extends the SklearnBaseRegressor class and uses the MambAttention model with the default MambAttention configuration.

Notes

The parameters for this class include the attributes from the config dataclass as well as preprocessing arguments handled by the base class.

Configuration class for the Default Mambular Attention model with predefined hyperparameters.

Parameters:
  • d_model (int, default=64) – Dimensionality of the model.

  • n_layers (int, default=4) – Number of layers in the model.

  • expand_factor (int, default=2) – Expansion factor for the feed-forward layers.

  • n_heads (int, default=8) – Number of attention heads in the model.

  • last_layer (str, default="attn") – Type of the last layer (e.g., ‘attn’).

  • n_mamba_per_attention (int, default=1) – Number of Mamba blocks per attention layer.

  • bias (bool, default=False) – Whether to use bias in the linear layers.

  • d_conv (int, default=4) – Dimensionality of the convolutional layers.

  • conv_bias (bool, default=True) – Whether to use bias in the convolutional layers.

  • dropout (float, default=0.0) – Dropout rate for regularization.

  • attn_dropout (float, default=0.2) – Dropout rate for the attention mechanism.

  • dt_rank (str, default="auto") – Rank of the decision tree.

  • d_state (int, default=128) – Dimensionality of the state in recurrent layers.

  • dt_scale (float, default=1.0) – Scaling factor for the decision tree.

  • dt_init (str, default="random") – Initialization method for the decision tree.

  • dt_max (float, default=0.1) – Maximum value for decision tree initialization.

  • dt_min (float, default=1e-04) – Minimum value for decision tree initialization.

  • dt_init_floor (float, default=1e-04) – Floor value for decision tree initialization.

  • norm (str, default="LayerNorm") – Type of normalization used in the model.

  • activation (callable, default=nn.SiLU()) – Activation function for the model.

  • head_layer_sizes (list, default=()) – Sizes of the fully connected layers in the model’s head.

  • head_dropout (float, default=0.5) – Dropout rate for the head layers.

  • head_skip_layers (bool, default=False) – Whether to use skip connections in the head layers.

  • head_activation (callable, default=nn.SELU()) – Activation function for the head layers.

  • head_use_batch_norm (bool, default=False) – Whether to use batch normalization in the head layers.

  • pooling_method (str, default="avg") – Pooling method to be used (‘avg’, ‘max’, etc.).

  • bidirectional (bool, default=False) – Whether to process input sequences bidirectionally.

  • use_learnable_interaction (bool, default=False) – Whether to use learnable feature interactions before passing through Mamba blocks.

  • use_cls (bool, default=False) – Whether to append a CLS token for sequence pooling.

  • shuffle_embeddings (bool, default=False) – Whether to shuffle embeddings before passing to Mamba layers.

  • cat_encoding (str, default="int") – Encoding method for categorical features (‘int’, ‘one-hot’, etc.).

  • AD_weight_decay (bool, default=True) – Whether weight decay is applied to A-D matrices.

  • BC_layer_norm (bool, default=False) – Whether to apply layer normalization to B-C matrices.

  • use_pscan (bool, default=False) – Whether to use PSCAN for the state-space model.

  • n_attention_layers (int, default=1) – Number of attention layers in the model.

  • feature_preprocessing (dict, optional) – Dictionary mapping feature names to specific preprocessing methods. Overrides global defaults.

  • n_bins (int, default=64) – Number of bins used for binning-based preprocessing (e.g., for discretizers or PLE).

  • numerical_preprocessing (str, default="ple") – Preprocessing method for numerical features (e.g., “standardization”, “minmax”, “ple”, “rbf”, etc.).

  • categorical_preprocessing (str, default="int") – Preprocessing method for categorical features (e.g., “int”, “ordinal”, “onehot”).

  • use_decision_tree_bins (bool, default=False) – Whether to use decision tree binning for numerical discretization.

  • binning_strategy (str, default="uniform") – Strategy for bin placement when not using tree-based methods. Options: “uniform”, “quantile”.

  • task (str, default="regression") – Problem type used to guide preprocessing (e.g., “regression” or “classification”).

  • cat_cutoff (float or int, default=0.03) – Threshold to determine whether integer-valued features are treated as categorical.

  • treat_all_integers_as_numerical (bool, default=False) – If True, treat all integer-typed columns as numerical regardless of cardinality.

  • degree (int, default=3) – Degree of polynomial or spline basis functions where applicable.

  • scaling_strategy (str, default="minmax") – Strategy for feature scaling (e.g., “standardization”, “minmax”, etc.).

  • n_knots (int, default=64) – Number of knots used in spline-based feature expansions.

  • use_decision_tree_knots (bool, default=True) – Whether to use decision tree-based knot placement for spline transformations.

  • knots_strategy (str, default="uniform") – Strategy for placing knots for splines (“uniform” or “quantile”).

  • spline_implementation (str, default="sklearn") – Which spline backend implementation to use (e.g., “sklearn”, “custom”).

  • min_unique_vals (int, default=5) – Minimum number of unique values required for a feature to be treated as numerical.

Examples

>>> from deeptab.models import MambAttentionRegressor
>>> model = MambAttentionRegressor(d_model=64, n_layers=8)
>>> model.fit(X_train, y_train)
>>> preds = model.predict(X_test)
>>> model.evaluate(X_test, y_test)
build_model(X, y, val_size=0.2, X_val=None, y_val=None, embeddings=None, embeddings_val=None, random_state=101, batch_size=128, shuffle=True, lr=None, lr_patience=None, lr_factor=None, weight_decay=None, train_metrics=None, val_metrics=None, dataloader_kwargs={})

Builds the model using the provided training data.

Parameters:
  • X (DataFrame or array-like, shape (n_samples, n_features)) – The training input samples.

  • y (array-like, shape (n_samples,) or (n_samples, n_targets)) – The target values (real numbers).

  • val_size (float) – The proportion of the dataset to include in the validation split if X_val is None. Ignored if X_val is provided.

  • X_val (DataFrame or array-like, shape (n_samples, n_features), optional) – The validation input samples. If provided, X and y are not split and this data is used for validation.

  • y_val (array-like, shape (n_samples,) or (n_samples, n_targets), optional) – The validation target values. Required if X_val is provided.

  • random_state (int) – Controls the shuffling applied to the data before applying the split.

  • batch_size (int) – Number of samples per gradient update.

  • shuffle (bool) – Whether to shuffle the training data before each epoch.

  • lr (float | None) – Learning rate for the optimizer.

  • lr_patience (int | None) – Number of epochs with no improvement on the validation loss to wait before reducing the learning rate.

  • factor (float, default=0.1) – Factor by which the learning rate will be reduced.

  • weight_decay (float | None) – Weight decay (L2 penalty) coefficient.

  • train_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during training.

  • val_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during validation.

  • dataloader_kwargs (dict, default={}) – The kwargs for the pytorch dataloader class.

Returns:

self – The built regressor.

Return type:

object

encode(X, embeddings=None, batch_size=64)

Encodes input data using the trained model’s embedding layer.

Parameters:
  • X (array-like or DataFrame) – Input data to be encoded.

  • batch_size (int, optional, default=64) – Batch size for encoding.

Returns:

Encoded representations of the input data.

Return type:

torch.Tensor

Raises:

ValueError – If the model or data module is not fitted.

evaluate(X, y_true, embeddings=None, metrics=None)

Evaluate the model on the given data using specified metrics.

Parameters:
  • X (array-like or pd.DataFrame of shape (n_samples, n_features)) – The input samples to predict.

  • y_true (array-like of shape (n_samples,) or (n_samples, n_outputs)) – The true target values against which to evaluate the predictions.

  • metrics (dict) – A dictionary where keys are metric names and values are the metric functions.

Notes

This method uses the predict method to generate predictions and computes each metric.

Returns:

scores – A dictionary with metric names as keys and their corresponding scores as values.

Return type:

dict

fit(X, y, val_size=0.2, X_val=None, y_val=None, embeddings=None, embeddings_val=None, max_epochs=100, random_state=101, batch_size=128, shuffle=True, patience=15, monitor='val_loss', mode='min', lr=None, lr_patience=None, lr_factor=None, weight_decay=None, checkpoint_path='model_checkpoints', dataloader_kwargs={}, train_metrics=None, val_metrics=None, rebuild=True, **trainer_kwargs)

Trains the regression model using the provided training data. Optionally, a separate validation set can be used.

Parameters:
  • X (DataFrame or array-like, shape (n_samples, n_features)) – The training input samples.

  • y (array-like, shape (n_samples,) or (n_samples, n_targets)) – The target values (real numbers).

  • val_size (float) – The proportion of the dataset to include in the validation split if X_val is None. Ignored if X_val is provided.

  • X_val (DataFrame or array-like, shape (n_samples, n_features), optional) – The validation input samples. If provided, X and y are not split and this data is used for validation.

  • y_val (array-like, shape (n_samples,) or (n_samples, n_targets), optional) – The validation target values. Required if X_val is provided.

  • max_epochs (int) – Maximum number of epochs for training.

  • random_state (int) – Controls the shuffling applied to the data before applying the split.

  • batch_size (int) – Number of samples per gradient update.

  • shuffle (bool) – Whether to shuffle the training data before each epoch.

  • patience (int) – Number of epochs with no improvement on the validation loss to wait before early stopping.

  • monitor (str) – The metric to monitor for early stopping.

  • mode (str) – Whether the monitored metric should be minimized (min) or maximized (max).

  • lr (float | None) – Learning rate for the optimizer.

  • lr_patience (int | None) – Number of epochs with no improvement on the validation loss to wait before reducing the learning rate.

  • factor (float, default=0.1) – Factor by which the learning rate will be reduced.

  • weight_decay (float | None) – Weight decay (L2 penalty) coefficient.

  • checkpoint_path (str, default="model_checkpoints") – Path where the checkpoints are being saved.

  • dataloader_kwargs (dict, default={}) – The kwargs for the pytorch dataloader class.

  • train_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during training.

  • val_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during validation.

  • rebuild (bool, default=True) – Whether to rebuild the model when it already was built.

  • **trainer_kwargs (Additional keyword arguments for PyTorch Lightning's Trainer class.)

Returns:

self – The fitted regressor.

Return type:

object

get_number_of_params(requires_grad=True)

Calculate the number of parameters in the model.

Parameters:

requires_grad (bool, optional) – If True, only count the parameters that require gradients (trainable parameters). If False, count all parameters. Default is True.

Returns:

The total number of parameters in the model.

Return type:

int

Raises:

ValueError – If the model has not been built prior to calling this method.

get_params(deep=True)

Get parameters for this estimator.

classmethod load(path)

Load and return a fitted model from path.

Parameters:

path (str) – Path to a file previously written by save().

Returns:

A fully reconstructed, ready-to-predict estimator of the same type that was saved.

Return type:

estimator

optimize_hparams(X, y, X_val=None, y_val=None, embeddings=None, embeddings_val=None, time=100, max_epochs=200, prune_by_epoch=True, prune_epoch=5, fixed_params={'cat_encoding': 'int', 'head_layer_size_length': 0, 'head_skip_layer': False, 'head_skip_layers': False, 'pooling_method': 'avg', 'use_cls': False}, custom_search_space=None, **optimize_kwargs)

Optimizes hyperparameters using Bayesian optimization with optional pruning.

Parameters:
  • X (array-like) – Training data.

  • y (array-like) – Training labels.

  • X_val (array-like, optional) – Validation data and labels.

  • y_val (array-like, optional) – Validation data and labels.

  • time (int) – The number of optimization trials to run.

  • max_epochs (int) – Maximum number of epochs for training.

  • prune_by_epoch (bool) – Whether to prune based on a specific epoch (True) or the best validation loss (False).

  • prune_epoch (int) – The specific epoch to prune by when prune_by_epoch is True.

  • **optimize_kwargs (dict) – Additional keyword arguments passed to the fit method.

Returns:

best_hparams – Best hyperparameters found during optimization.

Return type:

list

predict(X, embeddings=None, device=None)

Predicts target values for the given input samples.

Parameters:

X (DataFrame or array-like, shape (n_samples, n_features)) – The input samples for which to predict target values.

Returns:

predictions – The predicted target values.

Return type:

ndarray, shape (n_samples,) or (n_samples, n_outputs)

pretrain(pretrain_epochs=15, k_neighbors=10, temperature=0.1, save_path='pretrained_embeddings.pth', lr=0.001, use_positive=True, use_negative=False, pool_sequence=True)

Pretrains the embedding layer of the model using a contrastive learning approach.

This method performs pretraining by optimizing the embeddings with respect to neighborhood structure in the feature space. The embeddings are saved after training.

Parameters:
  • pretrain_epochs (int, default=15) – Number of epochs to run pretraining.

  • k_neighbors (int, default=10) – Number of neighbors used in the contrastive loss computation.

  • temperature (float, default=0.1) – Temperature parameter for contrastive loss scaling.

  • save_path (str, default="pretrained_embeddings.pth") – Path to save the pretrained embeddings.

  • lr (float, default=1e-3) – Learning rate for the pretraining optimizer.

  • use_positive (bool, default=True) – Whether to include positive pairs in contrastive learning.

  • use_negative (bool, default=False) – Whether to include negative pairs in contrastive learning.

  • pool_sequence (bool, default=True) – Whether to apply sequence pooling before computing contrastive loss.

Raises:
  • ValueError – If the model has not been built before calling this method.

  • ValueError – If the model does not contain an embedding layer.

Notes

  • This function requires that self.build_model() has been called beforehand.

  • The pretraining method uses self.task_model.estimator.embedding_layer.

  • The method invokes super()._pretrain() with regression mode enabled.

save(path)

Save the fitted model to path.

The bundle written by this method can be restored with load(). It contains all state required for inference: the config, the fitted preprocessor, feature metadata, and the neural-network weights.

Parameters:

path (str) – Destination file path (e.g. "model.pt").

Raises:

ValueError – If the model has not been fitted yet.

Return type:

None

score(X, y, embeddings=None, metric=sklearn.metrics.mean_squared_error)

Calculate the score of the model using the specified metric.

Parameters:
  • X (array-like or pd.DataFrame of shape (n_samples, n_features)) – The input samples to predict.

  • y (array-like of shape (n_samples,) or (n_samples, n_outputs)) – The true target values against which to evaluate the predictions.

  • metric (callable, default=mean_squared_error) – The metric function to use for evaluation. Must be a callable with the signature metric(y_true, y_pred).

Returns:

score – The score calculated using the specified metric.

Return type:

float

set_params(**parameters)

Set the parameters of this estimator.

class deeptab.models.MambAttentionLSS(*args: Any, **kwargs: Any)[source]

MambAttention LSS for distributional regression. This class extends the SklearnBaseLSS class and uses the MambAttention model with the default MambAttention configuration.

Notes

The parameters for this class include the attributes from the config dataclass as well as preprocessing arguments handled by the base class.

Configuration class for the Default Mambular Attention model with predefined hyperparameters.

Parameters:
  • d_model (int, default=64) – Dimensionality of the model.

  • n_layers (int, default=4) – Number of layers in the model.

  • expand_factor (int, default=2) – Expansion factor for the feed-forward layers.

  • n_heads (int, default=8) – Number of attention heads in the model.

  • last_layer (str, default="attn") – Type of the last layer (e.g., ‘attn’).

  • n_mamba_per_attention (int, default=1) – Number of Mamba blocks per attention layer.

  • bias (bool, default=False) – Whether to use bias in the linear layers.

  • d_conv (int, default=4) – Dimensionality of the convolutional layers.

  • conv_bias (bool, default=True) – Whether to use bias in the convolutional layers.

  • dropout (float, default=0.0) – Dropout rate for regularization.

  • attn_dropout (float, default=0.2) – Dropout rate for the attention mechanism.

  • dt_rank (str, default="auto") – Rank of the decision tree.

  • d_state (int, default=128) – Dimensionality of the state in recurrent layers.

  • dt_scale (float, default=1.0) – Scaling factor for the decision tree.

  • dt_init (str, default="random") – Initialization method for the decision tree.

  • dt_max (float, default=0.1) – Maximum value for decision tree initialization.

  • dt_min (float, default=1e-04) – Minimum value for decision tree initialization.

  • dt_init_floor (float, default=1e-04) – Floor value for decision tree initialization.

  • norm (str, default="LayerNorm") – Type of normalization used in the model.

  • activation (callable, default=nn.SiLU()) – Activation function for the model.

  • head_layer_sizes (list, default=()) – Sizes of the fully connected layers in the model’s head.

  • head_dropout (float, default=0.5) – Dropout rate for the head layers.

  • head_skip_layers (bool, default=False) – Whether to use skip connections in the head layers.

  • head_activation (callable, default=nn.SELU()) – Activation function for the head layers.

  • head_use_batch_norm (bool, default=False) – Whether to use batch normalization in the head layers.

  • pooling_method (str, default="avg") – Pooling method to be used (‘avg’, ‘max’, etc.).

  • bidirectional (bool, default=False) – Whether to process input sequences bidirectionally.

  • use_learnable_interaction (bool, default=False) – Whether to use learnable feature interactions before passing through Mamba blocks.

  • use_cls (bool, default=False) – Whether to append a CLS token for sequence pooling.

  • shuffle_embeddings (bool, default=False) – Whether to shuffle embeddings before passing to Mamba layers.

  • cat_encoding (str, default="int") – Encoding method for categorical features (‘int’, ‘one-hot’, etc.).

  • AD_weight_decay (bool, default=True) – Whether weight decay is applied to A-D matrices.

  • BC_layer_norm (bool, default=False) – Whether to apply layer normalization to B-C matrices.

  • use_pscan (bool, default=False) – Whether to use PSCAN for the state-space model.

  • n_attention_layers (int, default=1) – Number of attention layers in the model.

  • feature_preprocessing (dict, optional) – Dictionary mapping feature names to specific preprocessing methods. Overrides global defaults.

  • n_bins (int, default=64) – Number of bins used for binning-based preprocessing (e.g., for discretizers or PLE).

  • numerical_preprocessing (str, default="ple") – Preprocessing method for numerical features (e.g., “standardization”, “minmax”, “ple”, “rbf”, etc.).

  • categorical_preprocessing (str, default="int") – Preprocessing method for categorical features (e.g., “int”, “ordinal”, “onehot”).

  • use_decision_tree_bins (bool, default=False) – Whether to use decision tree binning for numerical discretization.

  • binning_strategy (str, default="uniform") – Strategy for bin placement when not using tree-based methods. Options: “uniform”, “quantile”.

  • task (str, default="regression") – Problem type used to guide preprocessing (e.g., “regression” or “classification”).

  • cat_cutoff (float or int, default=0.03) – Threshold to determine whether integer-valued features are treated as categorical.

  • treat_all_integers_as_numerical (bool, default=False) – If True, treat all integer-typed columns as numerical regardless of cardinality.

  • degree (int, default=3) – Degree of polynomial or spline basis functions where applicable.

  • scaling_strategy (str, default="minmax") – Strategy for feature scaling (e.g., “standardization”, “minmax”, etc.).

  • n_knots (int, default=64) – Number of knots used in spline-based feature expansions.

  • use_decision_tree_knots (bool, default=True) – Whether to use decision tree-based knot placement for spline transformations.

  • knots_strategy (str, default="uniform") – Strategy for placing knots for splines (“uniform” or “quantile”).

  • spline_implementation (str, default="sklearn") – Which spline backend implementation to use (e.g., “sklearn”, “custom”).

  • min_unique_vals (int, default=5) – Minimum number of unique values required for a feature to be treated as numerical.

Examples

>>> from MambAttention.models import MambAttentionLSS
>>> model = MambAttentionLSS(d_model=64, n_layers=8)
>>> model.fit(X_train, y_train, family='normal')
>>> preds = model.predict(X_test)
>>> model.evaluate(X_test, y_test)
build_model(X, y, val_size=0.2, X_val=None, y_val=None, random_state=101, batch_size=128, shuffle=True, lr=None, lr_patience=None, lr_factor=None, weight_decay=None, train_metrics=None, val_metrics=None, dataloader_kwargs={})

Builds the model using the provided training data.

Parameters:
  • X (DataFrame or array-like, shape (n_samples, n_features)) – The training input samples.

  • y (array-like, shape (n_samples,) or (n_samples, n_targets)) – The target values (real numbers).

  • val_size (float) – The proportion of the dataset to include in the validation split if X_val is None. Ignored if X_val is provided.

  • X_val (DataFrame or array-like, shape (n_samples, n_features), optional) – The validation input samples. If provided, X and y are not split and this data is used for validation.

  • y_val (array-like, shape (n_samples,) or (n_samples, n_targets), optional) – The validation target values. Required if X_val is provided.

  • random_state (int) – Controls the shuffling applied to the data before applying the split.

  • batch_size (int) – Number of samples per gradient update.

  • shuffle (bool) – Whether to shuffle the training data before each epoch.

  • lr (float | None) – Learning rate for the optimizer.

  • lr_patience (int | None) – Number of epochs with no improvement on the validation loss to wait before reducing the learning rate.

  • lr_factor (float | None) – Factor by which the learning rate will be reduced.

  • train_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during training.

  • val_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during validation.

  • weight_decay (float | None) – Weight decay (L2 penalty) coefficient.

  • dataloader_kwargs (dict, default={}) – The kwargs for the pytorch dataloader class.

Returns:

self – The built distributional regressor.

Return type:

object

encode(X, batch_size=64)

Encodes input data using the trained model’s embedding layer.

Parameters:
  • X (array-like or DataFrame) – Input data to be encoded.

  • batch_size (int, optional, default=64) – Batch size for encoding.

Returns:

Encoded representations of the input data.

Return type:

torch.Tensor

Raises:

ValueError – If the model or data module is not fitted.

evaluate(X, y_true, metrics=None, distribution_family=None)

Evaluate the model on the given data using specified metrics.

Parameters:
  • X (array-like or pd.DataFrame of shape (n_samples, n_features)) – The input samples to predict.

  • y_true (array-like of shape (n_samples,)) – The true class labels against which to evaluate the predictions.

  • metrics (dict) – A dictionary where keys are metric names and values are tuples containing the metric function and a boolean indicating whether the metric requires probability scores (True) or class labels (False).

  • distribution_family (str, optional) – Specifies the distribution family the model is predicting for. If None, it will attempt to infer based on the model’s settings.

Returns:

scores – A dictionary with metric names as keys and their corresponding scores as values.

Return type:

dict

Notes

This method uses either the predict or predict_proba method depending on the metric requirements.

fit(X, y, family, val_size=0.2, X_val=None, y_val=None, max_epochs=100, random_state=101, batch_size=128, shuffle=True, patience=15, monitor='val_loss', mode='min', lr=None, lr_patience=None, lr_factor=None, weight_decay=None, checkpoint_path='model_checkpoints', distributional_kwargs=None, train_metrics=None, val_metrics=None, dataloader_kwargs={}, rebuild=True, **trainer_kwargs)

Trains the regression model using the provided training data. Optionally, a separate validation set can be used.

Parameters:
  • X (DataFrame or array-like, shape (n_samples, n_features)) – The training input samples.

  • y (array-like, shape (n_samples,) or (n_samples, n_targets)) – The target values (real numbers).

  • family (str) – The name of the distribution family to use for the loss function. Examples include ‘normal’ for regression tasks.

  • val_size (float) – The proportion of the dataset to include in the validation split if X_val is None. Ignored if X_val is provided.

  • X_val (DataFrame or array-like, shape (n_samples, n_features), optional) – The validation input samples. If provided, X and y are not split and this data is used for validation.

  • y_val (array-like, shape (n_samples,) or (n_samples, n_targets), optional) – The validation target values. Required if X_val is provided.

  • max_epochs (int) – Maximum number of epochs for training.

  • random_state (int) – Controls the shuffling applied to the data before applying the split.

  • batch_size (int) – Number of samples per gradient update.

  • shuffle (bool) – Whether to shuffle the training data before each epoch.

  • patience (int) – Number of epochs with no improvement on the validation loss to wait before early stopping.

  • monitor (str) – The metric to monitor for early stopping.

  • mode (str) – Whether the monitored metric should be minimized (min) or maximized (max).

  • lr (float | None) – Learning rate for the optimizer.

  • lr_patience (int | None) – Number of epochs with no improvement on the validation loss to wait before reducing the learning rate.

  • factor (float, default=0.1) – Factor by which the learning rate will be reduced.

  • weight_decay (float | None) – Weight decay (L2 penalty) coefficient.

  • distributional_kwargs (dict, default=None) – any arguments taht are specific for a certain distribution.

  • train_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during training.

  • val_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during validation.

  • checkpoint_path (str, default="model_checkpoints") – Path where the checkpoints are being saved.

  • dataloader_kwargs (dict, default={}) – The kwargs for the pytorch dataloader class.

  • **trainer_kwargs (Additional keyword arguments for PyTorch Lightning's Trainer class.)

Returns:

self – The fitted regressor.

Return type:

object

get_default_metrics(distribution_family)

Provides default metrics based on the distribution family.

Parameters:

distribution_family (str) – The distribution family for which to provide default metrics.

Returns:

metrics – A dictionary of default metric functions.

Return type:

dict

get_number_of_params(requires_grad=True)

Calculate the number of parameters in the model.

Parameters:

requires_grad (bool, optional) – If True, only count the parameters that require gradients (trainable parameters). If False, count all parameters. Default is True.

Returns:

The total number of parameters in the model.

Return type:

int

Raises:

ValueError – If the model has not been built prior to calling this method.

get_params(deep=True)

Get parameters for this estimator.

Parameters:

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

params – Parameter names mapped to their values.

Return type:

dict

classmethod load(path)

Load and return a fitted model from path.

Parameters:

path (str) – Path to a file previously written by save().

Returns:

A fully reconstructed, ready-to-predict estimator.

Return type:

estimator

optimize_hparams(X, y, X_val=None, y_val=None, time=100, max_epochs=200, prune_by_epoch=True, prune_epoch=5, fixed_params={'cat_encoding': 'int', 'head_layer_size_length': 0, 'head_skip_layer': False, 'head_skip_layers': False, 'pooling_method': 'avg', 'use_cls': False}, custom_search_space=None, **optimize_kwargs)

Optimizes hyperparameters using Bayesian optimization with optional pruning.

Parameters:
  • X (array-like) – Training data.

  • y (array-like) – Training labels.

  • X_val (array-like, optional) – Validation data and labels.

  • y_val (array-like, optional) – Validation data and labels.

  • time (int) – The number of optimization trials to run.

  • max_epochs (int) – Maximum number of epochs for training.

  • prune_by_epoch (bool) – Whether to prune based on a specific epoch (True) or the best validation loss (False).

  • prune_epoch (int) – The specific epoch to prune by when prune_by_epoch is True.

  • **optimize_kwargs (dict) – Additional keyword arguments passed to the fit method.

Returns:

best_hparams – Best hyperparameters found during optimization.

Return type:

list

predict(X, raw=False, device=None)

Predicts target values for the given input samples.

Parameters:

X (DataFrame or array-like, shape (n_samples, n_features)) – The input samples for which to predict target values.

Returns:

predictions – The predicted target values.

Return type:

ndarray, shape (n_samples,) or (n_samples, n_outputs)

save(path)

Save the fitted model to path.

Parameters:

path (str) – Destination file path (e.g. "model.pt").

Raises:

ValueError – If the model has not been fitted yet.

Return type:

None

score(X, y, metric='NLL')

Calculate the score of the model using the specified metric.

Parameters:
  • X (array-like or pd.DataFrame of shape (n_samples, n_features)) – The input samples to predict.

  • y (array-like of shape (n_samples,) or (n_samples, n_outputs)) – The true target values against which to evaluate the predictions.

  • metric (str, default="NLL") – So far, only negative log-likelihood is supported

Returns:

score – The score calculated using the specified metric.

Return type:

float

set_params(**parameters)

Set the parameters of this estimator.

Parameters:

**parameters (dict) – Estimator parameters.

Returns:

self – Estimator instance.

Return type:

object

class deeptab.models.TabulaRNNClassifier(*args: Any, **kwargs: Any)[source]

TabulaRNN classifier. This class extends the SklearnBaseClassifier class and uses the TabulaRNN model with the default TabulaRNN configuration.

Notes

The parameters for this class include the attributes from the config dataclass as well as preprocessing arguments handled by the base class.

Configuration class for the TabulaRNN model with predefined hyperparameters.

Parameters:
  • model_type (str, default="RNN") – Type of model, one of “RNN”, “LSTM”, “GRU”, “mLSTM”, “sLSTM”.

  • n_layers (int, default=4) – Number of layers in the RNN.

  • rnn_dropout (float, default=0.2) – Dropout rate for the RNN layers.

  • d_model (int, default=128) – Dimensionality of embeddings or model representations.

  • norm (str, default="RMSNorm") – Normalization method to be used.

  • activation (callable, default=nn.SELU()) – Activation function for the RNN layers.

  • residuals (bool, default=False) – Whether to include residual connections in the RNN.

  • head_layer_sizes (list, default=()) – Sizes of the layers in the head of the model.

  • head_dropout (float, default=0.5) – Dropout rate for the head layers.

  • head_skip_layers (bool, default=False) – Whether to skip layers in the head.

  • head_activation (callable, default=nn.SELU()) – Activation function for the head layers.

  • head_use_batch_norm (bool, default=False) – Whether to use batch normalization in the head layers.

  • pooling_method (str, default="avg") – Pooling method to be used (‘avg’, ‘cls’, etc.).

  • norm_first (bool, default=False) – Whether to apply normalization before other operations in each block.

  • layer_norm_eps (float, default=1e-05) – Epsilon value for layer normalization.

  • bias (bool, default=True) – Whether to use bias in the linear layers.

  • rnn_activation (str, default="relu") – Activation function for the RNN layers.

  • dim_feedforward (int, default=256) – Size of the feedforward network.

  • d_conv (int, default=4) – Size of the convolutional layer for embedding features.

  • dilation (int, default=1) – Dilation factor for the convolution.

  • conv_bias (bool, default=True) – Whether to use bias in the convolutional layers.

  • feature_preprocessing (dict, optional) – Dictionary mapping feature names to specific preprocessing methods. Overrides global defaults.

  • n_bins (int, default=64) – Number of bins used for binning-based preprocessing (e.g., for discretizers or PLE).

  • numerical_preprocessing (str, default="ple") – Preprocessing method for numerical features (e.g., “standardization”, “minmax”, “ple”, “rbf”, etc.).

  • categorical_preprocessing (str, default="int") – Preprocessing method for categorical features (e.g., “int”, “ordinal”, “onehot”).

  • use_decision_tree_bins (bool, default=False) – Whether to use decision tree binning for numerical discretization.

  • binning_strategy (str, default="uniform") – Strategy for bin placement when not using tree-based methods. Options: “uniform”, “quantile”.

  • task (str, default="regression") – Problem type used to guide preprocessing (e.g., “regression” or “classification”).

  • cat_cutoff (float or int, default=0.03) – Threshold to determine whether integer-valued features are treated as categorical.

  • treat_all_integers_as_numerical (bool, default=False) – If True, treat all integer-typed columns as numerical regardless of cardinality.

  • degree (int, default=3) – Degree of polynomial or spline basis functions where applicable.

  • scaling_strategy (str, default="minmax") – Strategy for feature scaling (e.g., “standardization”, “minmax”, etc.).

  • n_knots (int, default=64) – Number of knots used in spline-based feature expansions.

  • use_decision_tree_knots (bool, default=True) – Whether to use decision tree-based knot placement for spline transformations.

  • knots_strategy (str, default="uniform") – Strategy for placing knots for splines (“uniform” or “quantile”).

  • spline_implementation (str, default="sklearn") – Which spline backend implementation to use (e.g., “sklearn”, “custom”).

  • min_unique_vals (int, default=5) – Minimum number of unique values required for a feature to be treated as numerical.

Examples

>>> from deeptab.models import TabulaRNNClassifier
>>> model = TabulaRNNClassifier(d_model=64)
>>> model.fit(X_train, y_train)
>>> preds = model.predict(X_test)
>>> model.evaluate(X_test, y_test)
build_model(X, y, val_size=0.2, X_val=None, y_val=None, embeddings=None, embeddings_val=None, random_state=101, batch_size=128, shuffle=True, lr=None, lr_patience=None, lr_factor=None, weight_decay=None, train_metrics=None, val_metrics=None, dataloader_kwargs={})

Builds the model using the provided training data.

Parameters:
  • X (DataFrame or array-like, shape (n_samples, n_features)) – The training input samples.

  • y (array-like, shape (n_samples,) or (n_samples, n_targets)) – The target values (real numbers).

  • val_size (float) – The proportion of the dataset to include in the validation split if X_val is None. Ignored if X_val is provided.

  • X_val (DataFrame or array-like, shape (n_samples, n_features), optional) – The validation input samples. If provided, X and y are not split and this data is used for validation.

  • y_val (array-like, shape (n_samples,) or (n_samples, n_targets), optional) – The validation target values. Required if X_val is provided.

  • random_state (int) – Controls the shuffling applied to the data before applying the split.

  • batch_size (int) – Number of samples per gradient update.

  • shuffle (bool) – Whether to shuffle the training data before each epoch.

  • lr (float | None) – Learning rate for the optimizer.

  • lr_patience (int | None) – Number of epochs with no improvement on the validation loss to wait before reducing the learning rate.

  • lr_factor (float | None) – Factor by which the learning rate will be reduced.

  • train_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during training.

  • val_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during validation.

  • weight_decay (float | None) – Weight decay (L2 penalty) coefficient.

  • dataloader_kwargs (dict, default={}) – The kwargs for the pytorch dataloader class.

Returns:

self – The built classifier.

Return type:

object

encode(X, embeddings=None, batch_size=64)

Encodes input data using the trained model’s embedding layer.

Parameters:
  • X (array-like or DataFrame) – Input data to be encoded.

  • batch_size (int, optional, default=64) – Batch size for encoding.

Returns:

Encoded representations of the input data.

Return type:

torch.Tensor

Raises:

ValueError – If the model or data module is not fitted.

evaluate(X, y_true, embeddings=None, metrics=None)

Evaluate the model on the given data using specified metrics.

Parameters:
  • X (array-like or pd.DataFrame of shape (n_samples, n_features)) – The input samples to predict.

  • y_true (array-like of shape (n_samples,)) – The true class labels against which to evaluate the predictions.

  • embneddings (array-like or list of shape(n_samples, dimension)) – List or array with embeddings for unstructured data inputs

  • metrics (dict) – A dictionary where keys are metric names and values are tuples containing the metric function and a boolean indicating whether the metric requires probability scores (True) or class labels (False).

Returns:

scores – A dictionary with metric names as keys and their corresponding scores as values.

Return type:

dict

Notes

This method uses either the predict or predict_proba method depending on the metric requirements.

fit(X, y, val_size=0.2, X_val=None, y_val=None, embeddings=None, embeddings_val=None, max_epochs=100, random_state=101, batch_size=128, shuffle=True, patience=15, monitor='val_loss', mode='min', lr=None, lr_patience=None, lr_factor=None, weight_decay=None, checkpoint_path='model_checkpoints', train_metrics=None, val_metrics=None, dataloader_kwargs={}, rebuild=True, **trainer_kwargs)

Trains the classification model using the provided training data. Optionally, a separate validation set can be used.

Parameters:
  • X (DataFrame or array-like, shape (n_samples, n_features)) – The training input samples.

  • y (array-like, shape (n_samples,) or (n_samples, n_targets)) – The target values (real numbers).

  • val_size (float) – The proportion of the dataset to include in the validation split if X_val is None. Ignored if X_val is provided.

  • X_val (DataFrame or array-like, shape (n_samples, n_features), optional) – The validation input samples. If provided, X and y are not split and this data is used for validation.

  • y_val (array-like, shape (n_samples,) or (n_samples, n_targets), optional) – The validation target values. Required if X_val is provided.

  • max_epochs (int) – Maximum number of epochs for training.

  • random_state (int) – Controls the shuffling applied to the data before applying the split.

  • batch_size (int) – Number of samples per gradient update.

  • shuffle (bool) – Whether to shuffle the training data before each epoch.

  • patience (int) – Number of epochs with no improvement on the validation loss to wait before early stopping.

  • monitor (str) – The metric to monitor for early stopping.

  • mode (str) – Whether the monitored metric should be minimized (min) or maximized (max).

  • lr (float | None) – Learning rate for the optimizer.

  • lr_patience (int | None) – Number of epochs with no improvement on the validation loss to wait before reducing the learning rate.

  • factor (float, default=0.1) – Factor by which the learning rate will be reduced.

  • weight_decay (float | None) – Weight decay (L2 penalty) coefficient.

  • checkpoint_path (str, default="model_checkpoints") – Path where the checkpoints are being saved.

  • train_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during training.

  • val_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during validation.

  • dataloader_kwargs (dict, default={}) – The kwargs for the pytorch dataloader class.

  • rebuild (bool, default=True) – Whether to rebuild the model when it already was built.

  • **trainer_kwargs (Additional keyword arguments for PyTorch Lightning's Trainer class.)

Returns:

self – The fitted classifier.

Return type:

object

get_number_of_params(requires_grad=True)

Calculate the number of parameters in the model.

Parameters:

requires_grad (bool, optional) – If True, only count the parameters that require gradients (trainable parameters). If False, count all parameters. Default is True.

Returns:

The total number of parameters in the model.

Return type:

int

Raises:

ValueError – If the model has not been built prior to calling this method.

get_params(deep=True)

Get parameters for this estimator.

classmethod load(path)

Load and return a fitted model from path.

Parameters:

path (str) – Path to a file previously written by save().

Returns:

A fully reconstructed, ready-to-predict estimator of the same type that was saved.

Return type:

estimator

optimize_hparams(X, y, X_val=None, y_val=None, embeddings=None, embeddings_val=None, time=100, max_epochs=200, prune_by_epoch=True, prune_epoch=5, fixed_params={'cat_encoding': 'int', 'head_layer_size_length': 0, 'head_skip_layer': False, 'head_skip_layers': False, 'pooling_method': 'avg', 'use_cls': False}, custom_search_space=None, **optimize_kwargs)

Optimizes hyperparameters using Bayesian optimization with optional pruning.

Parameters:
  • X (array-like) – Training data.

  • y (array-like) – Training labels.

  • X_val (array-like, optional) – Validation data and labels.

  • y_val (array-like, optional) – Validation data and labels.

  • time (int) – The number of optimization trials to run.

  • max_epochs (int) – Maximum number of epochs for training.

  • prune_by_epoch (bool) – Whether to prune based on a specific epoch (True) or the best validation loss (False).

  • prune_epoch (int) – The specific epoch to prune by when prune_by_epoch is True.

  • **optimize_kwargs (dict) – Additional keyword arguments passed to the fit method.

Returns:

best_hparams – Best hyperparameters found during optimization.

Return type:

list

predict(X, embeddings=None, device=None)

Predicts target labels for the given input samples.

Parameters:

X (DataFrame or array-like, shape (n_samples, n_features)) – The input samples for which to predict target values.

Returns:

predictions – The predicted class labels.

Return type:

ndarray, shape (n_samples,)

predict_proba(X, embeddings=None, device=None)

Predicts class probabilities for the given input samples.

Parameters:

X (DataFrame or array-like, shape (n_samples, n_features)) – The input samples for which to predict class probabilities.

Returns:

probabilities – The predicted class probabilities.

Return type:

ndarray, shape (n_samples, n_classes)

pretrain(pretrain_epochs=15, k_neighbors=10, temperature=0.1, save_path='pretrained_embeddings.pth', lr=0.001, use_positive=True, use_negative=False, pool_sequence=True)

Pretrains the embedding layer of the model using a contrastive learning approach.

This method performs pretraining by optimizing the embeddings with respect to neighborhood structure in the feature space. The embeddings are saved after training.

Parameters:
  • pretrain_epochs (int, default=15) – Number of epochs to run pretraining.

  • k_neighbors (int, default=10) – Number of neighbors used in the contrastive loss computation.

  • temperature (float, default=0.1) – Temperature parameter for contrastive loss scaling.

  • save_path (str, default="pretrained_embeddings.pth") – Path to save the pretrained embeddings.

  • lr (float, default=1e-3) – Learning rate for the pretraining optimizer.

  • use_positive (bool, default=True) – Whether to include positive pairs in contrastive learning.

  • use_negative (bool, default=False) – Whether to include negative pairs in contrastive learning.

  • pool_sequence (bool, default=True) – Whether to apply sequence pooling before computing contrastive loss.

Raises:
  • ValueError – If the model has not been built before calling this method.

  • ValueError – If the model does not contain an embedding layer.

Notes

  • This function requires that self.build_model() has been called beforehand.

  • The pretraining method uses self.task_model.estimator.embedding_layer.

  • The method invokes super()._pretrain() with regression mode enabled.

save(path)

Save the fitted model to path.

The bundle written by this method can be restored with load(). It contains all state required for inference: the config, the fitted preprocessor, feature metadata, and the neural-network weights.

Parameters:

path (str) – Destination file path (e.g. "model.pt").

Raises:

ValueError – If the model has not been fitted yet.

Return type:

None

score(X, y, embeddings=None, metric=(sklearn.metrics.log_loss, True))

Calculate the score of the model using the specified metric.

Parameters:
  • X (array-like or pd.DataFrame of shape (n_samples, n_features)) – The input samples to predict.

  • y (array-like of shape (n_samples,)) – The true class labels against which to evaluate the predictions.

  • metric (tuple, default=(log_loss, True)) – A tuple containing the metric function and a boolean indicating whether the metric requires probability scores (True) or class labels (False).

Returns:

score – The score calculated using the specified metric.

Return type:

float

set_params(**parameters)

Set the parameters of this estimator.

class deeptab.models.TabulaRNNRegressor(*args: Any, **kwargs: Any)[source]

TabulaRNN regressor. This class extends the SklearnBaseRegressor class and uses the TabulaRNN model with the default TabulaRNN configuration.

Notes

The parameters for this class include the attributes from the config dataclass as well as preprocessing arguments handled by the base class.

Configuration class for the TabulaRNN model with predefined hyperparameters.

Parameters:
  • model_type (str, default="RNN") – Type of model, one of “RNN”, “LSTM”, “GRU”, “mLSTM”, “sLSTM”.

  • n_layers (int, default=4) – Number of layers in the RNN.

  • rnn_dropout (float, default=0.2) – Dropout rate for the RNN layers.

  • d_model (int, default=128) – Dimensionality of embeddings or model representations.

  • norm (str, default="RMSNorm") – Normalization method to be used.

  • activation (callable, default=nn.SELU()) – Activation function for the RNN layers.

  • residuals (bool, default=False) – Whether to include residual connections in the RNN.

  • head_layer_sizes (list, default=()) – Sizes of the layers in the head of the model.

  • head_dropout (float, default=0.5) – Dropout rate for the head layers.

  • head_skip_layers (bool, default=False) – Whether to skip layers in the head.

  • head_activation (callable, default=nn.SELU()) – Activation function for the head layers.

  • head_use_batch_norm (bool, default=False) – Whether to use batch normalization in the head layers.

  • pooling_method (str, default="avg") – Pooling method to be used (‘avg’, ‘cls’, etc.).

  • norm_first (bool, default=False) – Whether to apply normalization before other operations in each block.

  • layer_norm_eps (float, default=1e-05) – Epsilon value for layer normalization.

  • bias (bool, default=True) – Whether to use bias in the linear layers.

  • rnn_activation (str, default="relu") – Activation function for the RNN layers.

  • dim_feedforward (int, default=256) – Size of the feedforward network.

  • d_conv (int, default=4) – Size of the convolutional layer for embedding features.

  • dilation (int, default=1) – Dilation factor for the convolution.

  • conv_bias (bool, default=True) – Whether to use bias in the convolutional layers.

  • feature_preprocessing (dict, optional) – Dictionary mapping feature names to specific preprocessing methods. Overrides global defaults.

  • n_bins (int, default=64) – Number of bins used for binning-based preprocessing (e.g., for discretizers or PLE).

  • numerical_preprocessing (str, default="ple") – Preprocessing method for numerical features (e.g., “standardization”, “minmax”, “ple”, “rbf”, etc.).

  • categorical_preprocessing (str, default="int") – Preprocessing method for categorical features (e.g., “int”, “ordinal”, “onehot”).

  • use_decision_tree_bins (bool, default=False) – Whether to use decision tree binning for numerical discretization.

  • binning_strategy (str, default="uniform") – Strategy for bin placement when not using tree-based methods. Options: “uniform”, “quantile”.

  • task (str, default="regression") – Problem type used to guide preprocessing (e.g., “regression” or “classification”).

  • cat_cutoff (float or int, default=0.03) – Threshold to determine whether integer-valued features are treated as categorical.

  • treat_all_integers_as_numerical (bool, default=False) – If True, treat all integer-typed columns as numerical regardless of cardinality.

  • degree (int, default=3) – Degree of polynomial or spline basis functions where applicable.

  • scaling_strategy (str, default="minmax") – Strategy for feature scaling (e.g., “standardization”, “minmax”, etc.).

  • n_knots (int, default=64) – Number of knots used in spline-based feature expansions.

  • use_decision_tree_knots (bool, default=True) – Whether to use decision tree-based knot placement for spline transformations.

  • knots_strategy (str, default="uniform") – Strategy for placing knots for splines (“uniform” or “quantile”).

  • spline_implementation (str, default="sklearn") – Which spline backend implementation to use (e.g., “sklearn”, “custom”).

  • min_unique_vals (int, default=5) – Minimum number of unique values required for a feature to be treated as numerical.

Examples

>>> from deeptab.models import TabulaRNNRegressor
>>> model = TabulaRNNRegressor(d_model=64)
>>> model.fit(X_train, y_train)
>>> preds = model.predict(X_test)
>>> model.evaluate(X_test, y_test)
build_model(X, y, val_size=0.2, X_val=None, y_val=None, embeddings=None, embeddings_val=None, random_state=101, batch_size=128, shuffle=True, lr=None, lr_patience=None, lr_factor=None, weight_decay=None, train_metrics=None, val_metrics=None, dataloader_kwargs={})

Builds the model using the provided training data.

Parameters:
  • X (DataFrame or array-like, shape (n_samples, n_features)) – The training input samples.

  • y (array-like, shape (n_samples,) or (n_samples, n_targets)) – The target values (real numbers).

  • val_size (float) – The proportion of the dataset to include in the validation split if X_val is None. Ignored if X_val is provided.

  • X_val (DataFrame or array-like, shape (n_samples, n_features), optional) – The validation input samples. If provided, X and y are not split and this data is used for validation.

  • y_val (array-like, shape (n_samples,) or (n_samples, n_targets), optional) – The validation target values. Required if X_val is provided.

  • random_state (int) – Controls the shuffling applied to the data before applying the split.

  • batch_size (int) – Number of samples per gradient update.

  • shuffle (bool) – Whether to shuffle the training data before each epoch.

  • lr (float | None) – Learning rate for the optimizer.

  • lr_patience (int | None) – Number of epochs with no improvement on the validation loss to wait before reducing the learning rate.

  • factor (float, default=0.1) – Factor by which the learning rate will be reduced.

  • weight_decay (float | None) – Weight decay (L2 penalty) coefficient.

  • train_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during training.

  • val_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during validation.

  • dataloader_kwargs (dict, default={}) – The kwargs for the pytorch dataloader class.

Returns:

self – The built regressor.

Return type:

object

encode(X, embeddings=None, batch_size=64)

Encodes input data using the trained model’s embedding layer.

Parameters:
  • X (array-like or DataFrame) – Input data to be encoded.

  • batch_size (int, optional, default=64) – Batch size for encoding.

Returns:

Encoded representations of the input data.

Return type:

torch.Tensor

Raises:

ValueError – If the model or data module is not fitted.

evaluate(X, y_true, embeddings=None, metrics=None)

Evaluate the model on the given data using specified metrics.

Parameters:
  • X (array-like or pd.DataFrame of shape (n_samples, n_features)) – The input samples to predict.

  • y_true (array-like of shape (n_samples,) or (n_samples, n_outputs)) – The true target values against which to evaluate the predictions.

  • metrics (dict) – A dictionary where keys are metric names and values are the metric functions.

Notes

This method uses the predict method to generate predictions and computes each metric.

Returns:

scores – A dictionary with metric names as keys and their corresponding scores as values.

Return type:

dict

fit(X, y, val_size=0.2, X_val=None, y_val=None, embeddings=None, embeddings_val=None, max_epochs=100, random_state=101, batch_size=128, shuffle=True, patience=15, monitor='val_loss', mode='min', lr=None, lr_patience=None, lr_factor=None, weight_decay=None, checkpoint_path='model_checkpoints', dataloader_kwargs={}, train_metrics=None, val_metrics=None, rebuild=True, **trainer_kwargs)

Trains the regression model using the provided training data. Optionally, a separate validation set can be used.

Parameters:
  • X (DataFrame or array-like, shape (n_samples, n_features)) – The training input samples.

  • y (array-like, shape (n_samples,) or (n_samples, n_targets)) – The target values (real numbers).

  • val_size (float) – The proportion of the dataset to include in the validation split if X_val is None. Ignored if X_val is provided.

  • X_val (DataFrame or array-like, shape (n_samples, n_features), optional) – The validation input samples. If provided, X and y are not split and this data is used for validation.

  • y_val (array-like, shape (n_samples,) or (n_samples, n_targets), optional) – The validation target values. Required if X_val is provided.

  • max_epochs (int) – Maximum number of epochs for training.

  • random_state (int) – Controls the shuffling applied to the data before applying the split.

  • batch_size (int) – Number of samples per gradient update.

  • shuffle (bool) – Whether to shuffle the training data before each epoch.

  • patience (int) – Number of epochs with no improvement on the validation loss to wait before early stopping.

  • monitor (str) – The metric to monitor for early stopping.

  • mode (str) – Whether the monitored metric should be minimized (min) or maximized (max).

  • lr (float | None) – Learning rate for the optimizer.

  • lr_patience (int | None) – Number of epochs with no improvement on the validation loss to wait before reducing the learning rate.

  • factor (float, default=0.1) – Factor by which the learning rate will be reduced.

  • weight_decay (float | None) – Weight decay (L2 penalty) coefficient.

  • checkpoint_path (str, default="model_checkpoints") – Path where the checkpoints are being saved.

  • dataloader_kwargs (dict, default={}) – The kwargs for the pytorch dataloader class.

  • train_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during training.

  • val_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during validation.

  • rebuild (bool, default=True) – Whether to rebuild the model when it already was built.

  • **trainer_kwargs (Additional keyword arguments for PyTorch Lightning's Trainer class.)

Returns:

self – The fitted regressor.

Return type:

object

get_number_of_params(requires_grad=True)

Calculate the number of parameters in the model.

Parameters:

requires_grad (bool, optional) – If True, only count the parameters that require gradients (trainable parameters). If False, count all parameters. Default is True.

Returns:

The total number of parameters in the model.

Return type:

int

Raises:

ValueError – If the model has not been built prior to calling this method.

get_params(deep=True)

Get parameters for this estimator.

classmethod load(path)

Load and return a fitted model from path.

Parameters:

path (str) – Path to a file previously written by save().

Returns:

A fully reconstructed, ready-to-predict estimator of the same type that was saved.

Return type:

estimator

optimize_hparams(X, y, X_val=None, y_val=None, embeddings=None, embeddings_val=None, time=100, max_epochs=200, prune_by_epoch=True, prune_epoch=5, fixed_params={'cat_encoding': 'int', 'head_layer_size_length': 0, 'head_skip_layer': False, 'head_skip_layers': False, 'pooling_method': 'avg', 'use_cls': False}, custom_search_space=None, **optimize_kwargs)

Optimizes hyperparameters using Bayesian optimization with optional pruning.

Parameters:
  • X (array-like) – Training data.

  • y (array-like) – Training labels.

  • X_val (array-like, optional) – Validation data and labels.

  • y_val (array-like, optional) – Validation data and labels.

  • time (int) – The number of optimization trials to run.

  • max_epochs (int) – Maximum number of epochs for training.

  • prune_by_epoch (bool) – Whether to prune based on a specific epoch (True) or the best validation loss (False).

  • prune_epoch (int) – The specific epoch to prune by when prune_by_epoch is True.

  • **optimize_kwargs (dict) – Additional keyword arguments passed to the fit method.

Returns:

best_hparams – Best hyperparameters found during optimization.

Return type:

list

predict(X, embeddings=None, device=None)

Predicts target values for the given input samples.

Parameters:

X (DataFrame or array-like, shape (n_samples, n_features)) – The input samples for which to predict target values.

Returns:

predictions – The predicted target values.

Return type:

ndarray, shape (n_samples,) or (n_samples, n_outputs)

pretrain(pretrain_epochs=15, k_neighbors=10, temperature=0.1, save_path='pretrained_embeddings.pth', lr=0.001, use_positive=True, use_negative=False, pool_sequence=True)

Pretrains the embedding layer of the model using a contrastive learning approach.

This method performs pretraining by optimizing the embeddings with respect to neighborhood structure in the feature space. The embeddings are saved after training.

Parameters:
  • pretrain_epochs (int, default=15) – Number of epochs to run pretraining.

  • k_neighbors (int, default=10) – Number of neighbors used in the contrastive loss computation.

  • temperature (float, default=0.1) – Temperature parameter for contrastive loss scaling.

  • save_path (str, default="pretrained_embeddings.pth") – Path to save the pretrained embeddings.

  • lr (float, default=1e-3) – Learning rate for the pretraining optimizer.

  • use_positive (bool, default=True) – Whether to include positive pairs in contrastive learning.

  • use_negative (bool, default=False) – Whether to include negative pairs in contrastive learning.

  • pool_sequence (bool, default=True) – Whether to apply sequence pooling before computing contrastive loss.

Raises:
  • ValueError – If the model has not been built before calling this method.

  • ValueError – If the model does not contain an embedding layer.

Notes

  • This function requires that self.build_model() has been called beforehand.

  • The pretraining method uses self.task_model.estimator.embedding_layer.

  • The method invokes super()._pretrain() with regression mode enabled.

save(path)

Save the fitted model to path.

The bundle written by this method can be restored with load(). It contains all state required for inference: the config, the fitted preprocessor, feature metadata, and the neural-network weights.

Parameters:

path (str) – Destination file path (e.g. "model.pt").

Raises:

ValueError – If the model has not been fitted yet.

Return type:

None

score(X, y, embeddings=None, metric=sklearn.metrics.mean_squared_error)

Calculate the score of the model using the specified metric.

Parameters:
  • X (array-like or pd.DataFrame of shape (n_samples, n_features)) – The input samples to predict.

  • y (array-like of shape (n_samples,) or (n_samples, n_outputs)) – The true target values against which to evaluate the predictions.

  • metric (callable, default=mean_squared_error) – The metric function to use for evaluation. Must be a callable with the signature metric(y_true, y_pred).

Returns:

score – The score calculated using the specified metric.

Return type:

float

set_params(**parameters)

Set the parameters of this estimator.

class deeptab.models.TabulaRNNLSS(*args: Any, **kwargs: Any)[source]

TabulaRNN for distributional regression. This class extends the SklearnBaseLSS class and uses the TabulaRNN model with the default TabulaRNN configuration. Supports RNN, LSTM, GRU, mLSTM, and sLSTM architectures.

Notes

The parameters for this class include the attributes from the config dataclass as well as preprocessing arguments handled by the base class.

Configuration class for the TabulaRNN model with predefined hyperparameters.

Parameters:
  • model_type (str, default="RNN") – Type of model, one of “RNN”, “LSTM”, “GRU”, “mLSTM”, “sLSTM”.

  • n_layers (int, default=4) – Number of layers in the RNN.

  • rnn_dropout (float, default=0.2) – Dropout rate for the RNN layers.

  • d_model (int, default=128) – Dimensionality of embeddings or model representations.

  • norm (str, default="RMSNorm") – Normalization method to be used.

  • activation (callable, default=nn.SELU()) – Activation function for the RNN layers.

  • residuals (bool, default=False) – Whether to include residual connections in the RNN.

  • head_layer_sizes (list, default=()) – Sizes of the layers in the head of the model.

  • head_dropout (float, default=0.5) – Dropout rate for the head layers.

  • head_skip_layers (bool, default=False) – Whether to skip layers in the head.

  • head_activation (callable, default=nn.SELU()) – Activation function for the head layers.

  • head_use_batch_norm (bool, default=False) – Whether to use batch normalization in the head layers.

  • pooling_method (str, default="avg") – Pooling method to be used (‘avg’, ‘cls’, etc.).

  • norm_first (bool, default=False) – Whether to apply normalization before other operations in each block.

  • layer_norm_eps (float, default=1e-05) – Epsilon value for layer normalization.

  • bias (bool, default=True) – Whether to use bias in the linear layers.

  • rnn_activation (str, default="relu") – Activation function for the RNN layers.

  • dim_feedforward (int, default=256) – Size of the feedforward network.

  • d_conv (int, default=4) – Size of the convolutional layer for embedding features.

  • dilation (int, default=1) – Dilation factor for the convolution.

  • conv_bias (bool, default=True) – Whether to use bias in the convolutional layers.

  • feature_preprocessing (dict, optional) – Dictionary mapping feature names to specific preprocessing methods. Overrides global defaults.

  • n_bins (int, default=64) – Number of bins used for binning-based preprocessing (e.g., for discretizers or PLE).

  • numerical_preprocessing (str, default="ple") – Preprocessing method for numerical features (e.g., “standardization”, “minmax”, “ple”, “rbf”, etc.).

  • categorical_preprocessing (str, default="int") – Preprocessing method for categorical features (e.g., “int”, “ordinal”, “onehot”).

  • use_decision_tree_bins (bool, default=False) – Whether to use decision tree binning for numerical discretization.

  • binning_strategy (str, default="uniform") – Strategy for bin placement when not using tree-based methods. Options: “uniform”, “quantile”.

  • task (str, default="regression") – Problem type used to guide preprocessing (e.g., “regression” or “classification”).

  • cat_cutoff (float or int, default=0.03) – Threshold to determine whether integer-valued features are treated as categorical.

  • treat_all_integers_as_numerical (bool, default=False) – If True, treat all integer-typed columns as numerical regardless of cardinality.

  • degree (int, default=3) – Degree of polynomial or spline basis functions where applicable.

  • scaling_strategy (str, default="minmax") – Strategy for feature scaling (e.g., “standardization”, “minmax”, etc.).

  • n_knots (int, default=64) – Number of knots used in spline-based feature expansions.

  • use_decision_tree_knots (bool, default=True) – Whether to use decision tree-based knot placement for spline transformations.

  • knots_strategy (str, default="uniform") – Strategy for placing knots for splines (“uniform” or “quantile”).

  • spline_implementation (str, default="sklearn") – Which spline backend implementation to use (e.g., “sklearn”, “custom”).

  • min_unique_vals (int, default=5) – Minimum number of unique values required for a feature to be treated as numerical.

Examples

>>> from deeptab.models import TabulaRNNLSS
>>> model = TabulaRNNLSS(model_type='LSTM', d_model=128, n_layers=4)
>>> model.fit(X_train, y_train, family='normal')
>>> preds = model.predict(X_test)
>>> model.evaluate(X_test, y_test)
build_model(X, y, val_size=0.2, X_val=None, y_val=None, random_state=101, batch_size=128, shuffle=True, lr=None, lr_patience=None, lr_factor=None, weight_decay=None, train_metrics=None, val_metrics=None, dataloader_kwargs={})

Builds the model using the provided training data.

Parameters:
  • X (DataFrame or array-like, shape (n_samples, n_features)) – The training input samples.

  • y (array-like, shape (n_samples,) or (n_samples, n_targets)) – The target values (real numbers).

  • val_size (float) – The proportion of the dataset to include in the validation split if X_val is None. Ignored if X_val is provided.

  • X_val (DataFrame or array-like, shape (n_samples, n_features), optional) – The validation input samples. If provided, X and y are not split and this data is used for validation.

  • y_val (array-like, shape (n_samples,) or (n_samples, n_targets), optional) – The validation target values. Required if X_val is provided.

  • random_state (int) – Controls the shuffling applied to the data before applying the split.

  • batch_size (int) – Number of samples per gradient update.

  • shuffle (bool) – Whether to shuffle the training data before each epoch.

  • lr (float | None) – Learning rate for the optimizer.

  • lr_patience (int | None) – Number of epochs with no improvement on the validation loss to wait before reducing the learning rate.

  • lr_factor (float | None) – Factor by which the learning rate will be reduced.

  • train_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during training.

  • val_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during validation.

  • weight_decay (float | None) – Weight decay (L2 penalty) coefficient.

  • dataloader_kwargs (dict, default={}) – The kwargs for the pytorch dataloader class.

Returns:

self – The built distributional regressor.

Return type:

object

encode(X, batch_size=64)

Encodes input data using the trained model’s embedding layer.

Parameters:
  • X (array-like or DataFrame) – Input data to be encoded.

  • batch_size (int, optional, default=64) – Batch size for encoding.

Returns:

Encoded representations of the input data.

Return type:

torch.Tensor

Raises:

ValueError – If the model or data module is not fitted.

evaluate(X, y_true, metrics=None, distribution_family=None)

Evaluate the model on the given data using specified metrics.

Parameters:
  • X (array-like or pd.DataFrame of shape (n_samples, n_features)) – The input samples to predict.

  • y_true (array-like of shape (n_samples,)) – The true class labels against which to evaluate the predictions.

  • metrics (dict) – A dictionary where keys are metric names and values are tuples containing the metric function and a boolean indicating whether the metric requires probability scores (True) or class labels (False).

  • distribution_family (str, optional) – Specifies the distribution family the model is predicting for. If None, it will attempt to infer based on the model’s settings.

Returns:

scores – A dictionary with metric names as keys and their corresponding scores as values.

Return type:

dict

Notes

This method uses either the predict or predict_proba method depending on the metric requirements.

fit(X, y, family, val_size=0.2, X_val=None, y_val=None, max_epochs=100, random_state=101, batch_size=128, shuffle=True, patience=15, monitor='val_loss', mode='min', lr=None, lr_patience=None, lr_factor=None, weight_decay=None, checkpoint_path='model_checkpoints', distributional_kwargs=None, train_metrics=None, val_metrics=None, dataloader_kwargs={}, rebuild=True, **trainer_kwargs)

Trains the regression model using the provided training data. Optionally, a separate validation set can be used.

Parameters:
  • X (DataFrame or array-like, shape (n_samples, n_features)) – The training input samples.

  • y (array-like, shape (n_samples,) or (n_samples, n_targets)) – The target values (real numbers).

  • family (str) – The name of the distribution family to use for the loss function. Examples include ‘normal’ for regression tasks.

  • val_size (float) – The proportion of the dataset to include in the validation split if X_val is None. Ignored if X_val is provided.

  • X_val (DataFrame or array-like, shape (n_samples, n_features), optional) – The validation input samples. If provided, X and y are not split and this data is used for validation.

  • y_val (array-like, shape (n_samples,) or (n_samples, n_targets), optional) – The validation target values. Required if X_val is provided.

  • max_epochs (int) – Maximum number of epochs for training.

  • random_state (int) – Controls the shuffling applied to the data before applying the split.

  • batch_size (int) – Number of samples per gradient update.

  • shuffle (bool) – Whether to shuffle the training data before each epoch.

  • patience (int) – Number of epochs with no improvement on the validation loss to wait before early stopping.

  • monitor (str) – The metric to monitor for early stopping.

  • mode (str) – Whether the monitored metric should be minimized (min) or maximized (max).

  • lr (float | None) – Learning rate for the optimizer.

  • lr_patience (int | None) – Number of epochs with no improvement on the validation loss to wait before reducing the learning rate.

  • factor (float, default=0.1) – Factor by which the learning rate will be reduced.

  • weight_decay (float | None) – Weight decay (L2 penalty) coefficient.

  • distributional_kwargs (dict, default=None) – any arguments taht are specific for a certain distribution.

  • train_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during training.

  • val_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during validation.

  • checkpoint_path (str, default="model_checkpoints") – Path where the checkpoints are being saved.

  • dataloader_kwargs (dict, default={}) – The kwargs for the pytorch dataloader class.

  • **trainer_kwargs (Additional keyword arguments for PyTorch Lightning's Trainer class.)

Returns:

self – The fitted regressor.

Return type:

object

get_default_metrics(distribution_family)

Provides default metrics based on the distribution family.

Parameters:

distribution_family (str) – The distribution family for which to provide default metrics.

Returns:

metrics – A dictionary of default metric functions.

Return type:

dict

get_number_of_params(requires_grad=True)

Calculate the number of parameters in the model.

Parameters:

requires_grad (bool, optional) – If True, only count the parameters that require gradients (trainable parameters). If False, count all parameters. Default is True.

Returns:

The total number of parameters in the model.

Return type:

int

Raises:

ValueError – If the model has not been built prior to calling this method.

get_params(deep=True)

Get parameters for this estimator.

Parameters:

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

params – Parameter names mapped to their values.

Return type:

dict

classmethod load(path)

Load and return a fitted model from path.

Parameters:

path (str) – Path to a file previously written by save().

Returns:

A fully reconstructed, ready-to-predict estimator.

Return type:

estimator

optimize_hparams(X, y, X_val=None, y_val=None, time=100, max_epochs=200, prune_by_epoch=True, prune_epoch=5, fixed_params={'cat_encoding': 'int', 'head_layer_size_length': 0, 'head_skip_layer': False, 'head_skip_layers': False, 'pooling_method': 'avg', 'use_cls': False}, custom_search_space=None, **optimize_kwargs)

Optimizes hyperparameters using Bayesian optimization with optional pruning.

Parameters:
  • X (array-like) – Training data.

  • y (array-like) – Training labels.

  • X_val (array-like, optional) – Validation data and labels.

  • y_val (array-like, optional) – Validation data and labels.

  • time (int) – The number of optimization trials to run.

  • max_epochs (int) – Maximum number of epochs for training.

  • prune_by_epoch (bool) – Whether to prune based on a specific epoch (True) or the best validation loss (False).

  • prune_epoch (int) – The specific epoch to prune by when prune_by_epoch is True.

  • **optimize_kwargs (dict) – Additional keyword arguments passed to the fit method.

Returns:

best_hparams – Best hyperparameters found during optimization.

Return type:

list

predict(X, raw=False, device=None)

Predicts target values for the given input samples.

Parameters:

X (DataFrame or array-like, shape (n_samples, n_features)) – The input samples for which to predict target values.

Returns:

predictions – The predicted target values.

Return type:

ndarray, shape (n_samples,) or (n_samples, n_outputs)

save(path)

Save the fitted model to path.

Parameters:

path (str) – Destination file path (e.g. "model.pt").

Raises:

ValueError – If the model has not been fitted yet.

Return type:

None

score(X, y, metric='NLL')

Calculate the score of the model using the specified metric.

Parameters:
  • X (array-like or pd.DataFrame of shape (n_samples, n_features)) – The input samples to predict.

  • y (array-like of shape (n_samples,) or (n_samples, n_outputs)) – The true target values against which to evaluate the predictions.

  • metric (str, default="NLL") – So far, only negative log-likelihood is supported

Returns:

score – The score calculated using the specified metric.

Return type:

float

set_params(**parameters)

Set the parameters of this estimator.

Parameters:

**parameters (dict) – Estimator parameters.

Returns:

self – Estimator instance.

Return type:

object

class deeptab.models.TabMClassifier(*args: Any, **kwargs: Any)[source]

TabM classifier. This class extends the SklearnBaseClassifier class and uses the TabM model with the default TabM configuration.

Notes

The parameters for this class include the attributes from the config dataclass as well as preprocessing arguments handled by the base class.

Configuration class for the TabM model with batch ensembling and predefined hyperparameters.

Parameters:
  • layer_sizes (list, default=(512, 512, 128)) – Sizes of the layers in the model.

  • activation (callable, default=nn.ReLU()) – Activation function for the model layers.

  • dropout (float, default=0.3) – Dropout rate for regularization.

  • norm (str, default=None) – Normalization method to be used, if any.

  • use_glu (bool, default=False) – Whether to use Gated Linear Units (GLU) in the model.

  • ensemble_size (int, default=32) – Number of ensemble members for batch ensembling.

  • ensemble_scaling_in (bool, default=True) – Whether to use input scaling for each ensemble member.

  • ensemble_scaling_out (bool, default=True) – Whether to use output scaling for each ensemble member.

  • ensemble_bias (bool, default=True) – Whether to use a unique bias term for each ensemble member.

  • scaling_init ({"ones", "random-signs", "normal"}, default="normal") – Initialization method for scaling weights.

  • average_ensembles (bool, default=False) – Whether to average the outputs of the ensembles.

  • model_type ({"mini", "full"}, default="mini") – Model type to use (‘mini’ for reduced version, ‘full’ for complete model).

  • feature_preprocessing (dict, optional) – Dictionary mapping feature names to specific preprocessing methods. Overrides global defaults.

  • n_bins (int, default=64) – Number of bins used for binning-based preprocessing (e.g., for discretizers or PLE).

  • numerical_preprocessing (str, default="ple") – Preprocessing method for numerical features (e.g., “standardization”, “minmax”, “ple”, “rbf”, etc.).

  • categorical_preprocessing (str, default="int") – Preprocessing method for categorical features (e.g., “int”, “ordinal”, “onehot”).

  • use_decision_tree_bins (bool, default=False) – Whether to use decision tree binning for numerical discretization.

  • binning_strategy (str, default="uniform") – Strategy for bin placement when not using tree-based methods. Options: “uniform”, “quantile”.

  • task (str, default="regression") – Problem type used to guide preprocessing (e.g., “regression” or “classification”).

  • cat_cutoff (float or int, default=0.03) – Threshold to determine whether integer-valued features are treated as categorical.

  • treat_all_integers_as_numerical (bool, default=False) – If True, treat all integer-typed columns as numerical regardless of cardinality.

  • degree (int, default=3) – Degree of polynomial or spline basis functions where applicable.

  • scaling_strategy (str, default="minmax") – Strategy for feature scaling (e.g., “standardization”, “minmax”, etc.).

  • n_knots (int, default=64) – Number of knots used in spline-based feature expansions.

  • use_decision_tree_knots (bool, default=True) – Whether to use decision tree-based knot placement for spline transformations.

  • knots_strategy (str, default="uniform") – Strategy for placing knots for splines (“uniform” or “quantile”).

  • spline_implementation (str, default="sklearn") – Which spline backend implementation to use (e.g., “sklearn”, “custom”).

  • min_unique_vals (int, default=5) – Minimum number of unique values required for a feature to be treated as numerical.

Examples

>>> from deeptab.models import TabMClassifier
>>> model = TabMClassifier(ensemble_size=32, model_type='full')
>>> model.fit(X_train, y_train)
>>> preds = model.predict(X_test)
>>> model.evaluate(X_test, y_test)
build_model(X, y, val_size=0.2, X_val=None, y_val=None, embeddings=None, embeddings_val=None, random_state=101, batch_size=128, shuffle=True, lr=None, lr_patience=None, lr_factor=None, weight_decay=None, train_metrics=None, val_metrics=None, dataloader_kwargs={})

Builds the model using the provided training data.

Parameters:
  • X (DataFrame or array-like, shape (n_samples, n_features)) – The training input samples.

  • y (array-like, shape (n_samples,) or (n_samples, n_targets)) – The target values (real numbers).

  • val_size (float) – The proportion of the dataset to include in the validation split if X_val is None. Ignored if X_val is provided.

  • X_val (DataFrame or array-like, shape (n_samples, n_features), optional) – The validation input samples. If provided, X and y are not split and this data is used for validation.

  • y_val (array-like, shape (n_samples,) or (n_samples, n_targets), optional) – The validation target values. Required if X_val is provided.

  • random_state (int) – Controls the shuffling applied to the data before applying the split.

  • batch_size (int) – Number of samples per gradient update.

  • shuffle (bool) – Whether to shuffle the training data before each epoch.

  • lr (float | None) – Learning rate for the optimizer.

  • lr_patience (int | None) – Number of epochs with no improvement on the validation loss to wait before reducing the learning rate.

  • lr_factor (float | None) – Factor by which the learning rate will be reduced.

  • train_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during training.

  • val_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during validation.

  • weight_decay (float | None) – Weight decay (L2 penalty) coefficient.

  • dataloader_kwargs (dict, default={}) – The kwargs for the pytorch dataloader class.

Returns:

self – The built classifier.

Return type:

object

encode(X, embeddings=None, batch_size=64)

Encodes input data using the trained model’s embedding layer.

Parameters:
  • X (array-like or DataFrame) – Input data to be encoded.

  • batch_size (int, optional, default=64) – Batch size for encoding.

Returns:

Encoded representations of the input data.

Return type:

torch.Tensor

Raises:

ValueError – If the model or data module is not fitted.

evaluate(X, y_true, embeddings=None, metrics=None)

Evaluate the model on the given data using specified metrics.

Parameters:
  • X (array-like or pd.DataFrame of shape (n_samples, n_features)) – The input samples to predict.

  • y_true (array-like of shape (n_samples,)) – The true class labels against which to evaluate the predictions.

  • embneddings (array-like or list of shape(n_samples, dimension)) – List or array with embeddings for unstructured data inputs

  • metrics (dict) – A dictionary where keys are metric names and values are tuples containing the metric function and a boolean indicating whether the metric requires probability scores (True) or class labels (False).

Returns:

scores – A dictionary with metric names as keys and their corresponding scores as values.

Return type:

dict

Notes

This method uses either the predict or predict_proba method depending on the metric requirements.

fit(X, y, val_size=0.2, X_val=None, y_val=None, embeddings=None, embeddings_val=None, max_epochs=100, random_state=101, batch_size=128, shuffle=True, patience=15, monitor='val_loss', mode='min', lr=None, lr_patience=None, lr_factor=None, weight_decay=None, checkpoint_path='model_checkpoints', train_metrics=None, val_metrics=None, dataloader_kwargs={}, rebuild=True, **trainer_kwargs)

Trains the classification model using the provided training data. Optionally, a separate validation set can be used.

Parameters:
  • X (DataFrame or array-like, shape (n_samples, n_features)) – The training input samples.

  • y (array-like, shape (n_samples,) or (n_samples, n_targets)) – The target values (real numbers).

  • val_size (float) – The proportion of the dataset to include in the validation split if X_val is None. Ignored if X_val is provided.

  • X_val (DataFrame or array-like, shape (n_samples, n_features), optional) – The validation input samples. If provided, X and y are not split and this data is used for validation.

  • y_val (array-like, shape (n_samples,) or (n_samples, n_targets), optional) – The validation target values. Required if X_val is provided.

  • max_epochs (int) – Maximum number of epochs for training.

  • random_state (int) – Controls the shuffling applied to the data before applying the split.

  • batch_size (int) – Number of samples per gradient update.

  • shuffle (bool) – Whether to shuffle the training data before each epoch.

  • patience (int) – Number of epochs with no improvement on the validation loss to wait before early stopping.

  • monitor (str) – The metric to monitor for early stopping.

  • mode (str) – Whether the monitored metric should be minimized (min) or maximized (max).

  • lr (float | None) – Learning rate for the optimizer.

  • lr_patience (int | None) – Number of epochs with no improvement on the validation loss to wait before reducing the learning rate.

  • factor (float, default=0.1) – Factor by which the learning rate will be reduced.

  • weight_decay (float | None) – Weight decay (L2 penalty) coefficient.

  • checkpoint_path (str, default="model_checkpoints") – Path where the checkpoints are being saved.

  • train_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during training.

  • val_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during validation.

  • dataloader_kwargs (dict, default={}) – The kwargs for the pytorch dataloader class.

  • rebuild (bool, default=True) – Whether to rebuild the model when it already was built.

  • **trainer_kwargs (Additional keyword arguments for PyTorch Lightning's Trainer class.)

Returns:

self – The fitted classifier.

Return type:

object

get_number_of_params(requires_grad=True)

Calculate the number of parameters in the model.

Parameters:

requires_grad (bool, optional) – If True, only count the parameters that require gradients (trainable parameters). If False, count all parameters. Default is True.

Returns:

The total number of parameters in the model.

Return type:

int

Raises:

ValueError – If the model has not been built prior to calling this method.

get_params(deep=True)

Get parameters for this estimator.

classmethod load(path)

Load and return a fitted model from path.

Parameters:

path (str) – Path to a file previously written by save().

Returns:

A fully reconstructed, ready-to-predict estimator of the same type that was saved.

Return type:

estimator

optimize_hparams(X, y, X_val=None, y_val=None, embeddings=None, embeddings_val=None, time=100, max_epochs=200, prune_by_epoch=True, prune_epoch=5, fixed_params={'cat_encoding': 'int', 'head_layer_size_length': 0, 'head_skip_layer': False, 'head_skip_layers': False, 'pooling_method': 'avg', 'use_cls': False}, custom_search_space=None, **optimize_kwargs)

Optimizes hyperparameters using Bayesian optimization with optional pruning.

Parameters:
  • X (array-like) – Training data.

  • y (array-like) – Training labels.

  • X_val (array-like, optional) – Validation data and labels.

  • y_val (array-like, optional) – Validation data and labels.

  • time (int) – The number of optimization trials to run.

  • max_epochs (int) – Maximum number of epochs for training.

  • prune_by_epoch (bool) – Whether to prune based on a specific epoch (True) or the best validation loss (False).

  • prune_epoch (int) – The specific epoch to prune by when prune_by_epoch is True.

  • **optimize_kwargs (dict) – Additional keyword arguments passed to the fit method.

Returns:

best_hparams – Best hyperparameters found during optimization.

Return type:

list

predict(X, embeddings=None, device=None)

Predicts target labels for the given input samples.

Parameters:

X (DataFrame or array-like, shape (n_samples, n_features)) – The input samples for which to predict target values.

Returns:

predictions – The predicted class labels.

Return type:

ndarray, shape (n_samples,)

predict_proba(X, embeddings=None, device=None)

Predicts class probabilities for the given input samples.

Parameters:

X (DataFrame or array-like, shape (n_samples, n_features)) – The input samples for which to predict class probabilities.

Returns:

probabilities – The predicted class probabilities.

Return type:

ndarray, shape (n_samples, n_classes)

pretrain(pretrain_epochs=15, k_neighbors=10, temperature=0.1, save_path='pretrained_embeddings.pth', lr=0.001, use_positive=True, use_negative=False, pool_sequence=True)

Pretrains the embedding layer of the model using a contrastive learning approach.

This method performs pretraining by optimizing the embeddings with respect to neighborhood structure in the feature space. The embeddings are saved after training.

Parameters:
  • pretrain_epochs (int, default=15) – Number of epochs to run pretraining.

  • k_neighbors (int, default=10) – Number of neighbors used in the contrastive loss computation.

  • temperature (float, default=0.1) – Temperature parameter for contrastive loss scaling.

  • save_path (str, default="pretrained_embeddings.pth") – Path to save the pretrained embeddings.

  • lr (float, default=1e-3) – Learning rate for the pretraining optimizer.

  • use_positive (bool, default=True) – Whether to include positive pairs in contrastive learning.

  • use_negative (bool, default=False) – Whether to include negative pairs in contrastive learning.

  • pool_sequence (bool, default=True) – Whether to apply sequence pooling before computing contrastive loss.

Raises:
  • ValueError – If the model has not been built before calling this method.

  • ValueError – If the model does not contain an embedding layer.

Notes

  • This function requires that self.build_model() has been called beforehand.

  • The pretraining method uses self.task_model.estimator.embedding_layer.

  • The method invokes super()._pretrain() with regression mode enabled.

save(path)

Save the fitted model to path.

The bundle written by this method can be restored with load(). It contains all state required for inference: the config, the fitted preprocessor, feature metadata, and the neural-network weights.

Parameters:

path (str) – Destination file path (e.g. "model.pt").

Raises:

ValueError – If the model has not been fitted yet.

Return type:

None

score(X, y, embeddings=None, metric=(sklearn.metrics.log_loss, True))

Calculate the score of the model using the specified metric.

Parameters:
  • X (array-like or pd.DataFrame of shape (n_samples, n_features)) – The input samples to predict.

  • y (array-like of shape (n_samples,)) – The true class labels against which to evaluate the predictions.

  • metric (tuple, default=(log_loss, True)) – A tuple containing the metric function and a boolean indicating whether the metric requires probability scores (True) or class labels (False).

Returns:

score – The score calculated using the specified metric.

Return type:

float

set_params(**parameters)

Set the parameters of this estimator.

class deeptab.models.TabMRegressor(*args: Any, **kwargs: Any)[source]

TabM regressor. This class extends the SklearnBaseRegressor class and uses the TabM model with the default TabM configuration.

Notes

The parameters for this class include the attributes from the config dataclass as well as preprocessing arguments handled by the base class.

Configuration class for the TabM model with batch ensembling and predefined hyperparameters.

Parameters:
  • layer_sizes (list, default=(512, 512, 128)) – Sizes of the layers in the model.

  • activation (callable, default=nn.ReLU()) – Activation function for the model layers.

  • dropout (float, default=0.3) – Dropout rate for regularization.

  • norm (str, default=None) – Normalization method to be used, if any.

  • use_glu (bool, default=False) – Whether to use Gated Linear Units (GLU) in the model.

  • ensemble_size (int, default=32) – Number of ensemble members for batch ensembling.

  • ensemble_scaling_in (bool, default=True) – Whether to use input scaling for each ensemble member.

  • ensemble_scaling_out (bool, default=True) – Whether to use output scaling for each ensemble member.

  • ensemble_bias (bool, default=True) – Whether to use a unique bias term for each ensemble member.

  • scaling_init ({"ones", "random-signs", "normal"}, default="normal") – Initialization method for scaling weights.

  • average_ensembles (bool, default=False) – Whether to average the outputs of the ensembles.

  • model_type ({"mini", "full"}, default="mini") – Model type to use (‘mini’ for reduced version, ‘full’ for complete model).

  • feature_preprocessing (dict, optional) – Dictionary mapping feature names to specific preprocessing methods. Overrides global defaults.

  • n_bins (int, default=64) – Number of bins used for binning-based preprocessing (e.g., for discretizers or PLE).

  • numerical_preprocessing (str, default="ple") – Preprocessing method for numerical features (e.g., “standardization”, “minmax”, “ple”, “rbf”, etc.).

  • categorical_preprocessing (str, default="int") – Preprocessing method for categorical features (e.g., “int”, “ordinal”, “onehot”).

  • use_decision_tree_bins (bool, default=False) – Whether to use decision tree binning for numerical discretization.

  • binning_strategy (str, default="uniform") – Strategy for bin placement when not using tree-based methods. Options: “uniform”, “quantile”.

  • task (str, default="regression") – Problem type used to guide preprocessing (e.g., “regression” or “classification”).

  • cat_cutoff (float or int, default=0.03) – Threshold to determine whether integer-valued features are treated as categorical.

  • treat_all_integers_as_numerical (bool, default=False) – If True, treat all integer-typed columns as numerical regardless of cardinality.

  • degree (int, default=3) – Degree of polynomial or spline basis functions where applicable.

  • scaling_strategy (str, default="minmax") – Strategy for feature scaling (e.g., “standardization”, “minmax”, etc.).

  • n_knots (int, default=64) – Number of knots used in spline-based feature expansions.

  • use_decision_tree_knots (bool, default=True) – Whether to use decision tree-based knot placement for spline transformations.

  • knots_strategy (str, default="uniform") – Strategy for placing knots for splines (“uniform” or “quantile”).

  • spline_implementation (str, default="sklearn") – Which spline backend implementation to use (e.g., “sklearn”, “custom”).

  • min_unique_vals (int, default=5) – Minimum number of unique values required for a feature to be treated as numerical.

Examples

>>> from deeptab.models import TabMRegressor
>>> model = TabMRegressor(ensemble_size=32, model_type='full')
>>> model.fit(X_train, y_train)
>>> preds = model.predict(X_test)
>>> model.evaluate(X_test, y_test)
build_model(X, y, val_size=0.2, X_val=None, y_val=None, embeddings=None, embeddings_val=None, random_state=101, batch_size=128, shuffle=True, lr=None, lr_patience=None, lr_factor=None, weight_decay=None, train_metrics=None, val_metrics=None, dataloader_kwargs={})

Builds the model using the provided training data.

Parameters:
  • X (DataFrame or array-like, shape (n_samples, n_features)) – The training input samples.

  • y (array-like, shape (n_samples,) or (n_samples, n_targets)) – The target values (real numbers).

  • val_size (float) – The proportion of the dataset to include in the validation split if X_val is None. Ignored if X_val is provided.

  • X_val (DataFrame or array-like, shape (n_samples, n_features), optional) – The validation input samples. If provided, X and y are not split and this data is used for validation.

  • y_val (array-like, shape (n_samples,) or (n_samples, n_targets), optional) – The validation target values. Required if X_val is provided.

  • random_state (int) – Controls the shuffling applied to the data before applying the split.

  • batch_size (int) – Number of samples per gradient update.

  • shuffle (bool) – Whether to shuffle the training data before each epoch.

  • lr (float | None) – Learning rate for the optimizer.

  • lr_patience (int | None) – Number of epochs with no improvement on the validation loss to wait before reducing the learning rate.

  • factor (float, default=0.1) – Factor by which the learning rate will be reduced.

  • weight_decay (float | None) – Weight decay (L2 penalty) coefficient.

  • train_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during training.

  • val_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during validation.

  • dataloader_kwargs (dict, default={}) – The kwargs for the pytorch dataloader class.

Returns:

self – The built regressor.

Return type:

object

encode(X, embeddings=None, batch_size=64)

Encodes input data using the trained model’s embedding layer.

Parameters:
  • X (array-like or DataFrame) – Input data to be encoded.

  • batch_size (int, optional, default=64) – Batch size for encoding.

Returns:

Encoded representations of the input data.

Return type:

torch.Tensor

Raises:

ValueError – If the model or data module is not fitted.

evaluate(X, y_true, embeddings=None, metrics=None)

Evaluate the model on the given data using specified metrics.

Parameters:
  • X (array-like or pd.DataFrame of shape (n_samples, n_features)) – The input samples to predict.

  • y_true (array-like of shape (n_samples,) or (n_samples, n_outputs)) – The true target values against which to evaluate the predictions.

  • metrics (dict) – A dictionary where keys are metric names and values are the metric functions.

Notes

This method uses the predict method to generate predictions and computes each metric.

Returns:

scores – A dictionary with metric names as keys and their corresponding scores as values.

Return type:

dict

fit(X, y, val_size=0.2, X_val=None, y_val=None, embeddings=None, embeddings_val=None, max_epochs=100, random_state=101, batch_size=128, shuffle=True, patience=15, monitor='val_loss', mode='min', lr=None, lr_patience=None, lr_factor=None, weight_decay=None, checkpoint_path='model_checkpoints', dataloader_kwargs={}, train_metrics=None, val_metrics=None, rebuild=True, **trainer_kwargs)

Trains the regression model using the provided training data. Optionally, a separate validation set can be used.

Parameters:
  • X (DataFrame or array-like, shape (n_samples, n_features)) – The training input samples.

  • y (array-like, shape (n_samples,) or (n_samples, n_targets)) – The target values (real numbers).

  • val_size (float) – The proportion of the dataset to include in the validation split if X_val is None. Ignored if X_val is provided.

  • X_val (DataFrame or array-like, shape (n_samples, n_features), optional) – The validation input samples. If provided, X and y are not split and this data is used for validation.

  • y_val (array-like, shape (n_samples,) or (n_samples, n_targets), optional) – The validation target values. Required if X_val is provided.

  • max_epochs (int) – Maximum number of epochs for training.

  • random_state (int) – Controls the shuffling applied to the data before applying the split.

  • batch_size (int) – Number of samples per gradient update.

  • shuffle (bool) – Whether to shuffle the training data before each epoch.

  • patience (int) – Number of epochs with no improvement on the validation loss to wait before early stopping.

  • monitor (str) – The metric to monitor for early stopping.

  • mode (str) – Whether the monitored metric should be minimized (min) or maximized (max).

  • lr (float | None) – Learning rate for the optimizer.

  • lr_patience (int | None) – Number of epochs with no improvement on the validation loss to wait before reducing the learning rate.

  • factor (float, default=0.1) – Factor by which the learning rate will be reduced.

  • weight_decay (float | None) – Weight decay (L2 penalty) coefficient.

  • checkpoint_path (str, default="model_checkpoints") – Path where the checkpoints are being saved.

  • dataloader_kwargs (dict, default={}) – The kwargs for the pytorch dataloader class.

  • train_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during training.

  • val_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during validation.

  • rebuild (bool, default=True) – Whether to rebuild the model when it already was built.

  • **trainer_kwargs (Additional keyword arguments for PyTorch Lightning's Trainer class.)

Returns:

self – The fitted regressor.

Return type:

object

get_number_of_params(requires_grad=True)

Calculate the number of parameters in the model.

Parameters:

requires_grad (bool, optional) – If True, only count the parameters that require gradients (trainable parameters). If False, count all parameters. Default is True.

Returns:

The total number of parameters in the model.

Return type:

int

Raises:

ValueError – If the model has not been built prior to calling this method.

get_params(deep=True)

Get parameters for this estimator.

classmethod load(path)

Load and return a fitted model from path.

Parameters:

path (str) – Path to a file previously written by save().

Returns:

A fully reconstructed, ready-to-predict estimator of the same type that was saved.

Return type:

estimator

optimize_hparams(X, y, X_val=None, y_val=None, embeddings=None, embeddings_val=None, time=100, max_epochs=200, prune_by_epoch=True, prune_epoch=5, fixed_params={'cat_encoding': 'int', 'head_layer_size_length': 0, 'head_skip_layer': False, 'head_skip_layers': False, 'pooling_method': 'avg', 'use_cls': False}, custom_search_space=None, **optimize_kwargs)

Optimizes hyperparameters using Bayesian optimization with optional pruning.

Parameters:
  • X (array-like) – Training data.

  • y (array-like) – Training labels.

  • X_val (array-like, optional) – Validation data and labels.

  • y_val (array-like, optional) – Validation data and labels.

  • time (int) – The number of optimization trials to run.

  • max_epochs (int) – Maximum number of epochs for training.

  • prune_by_epoch (bool) – Whether to prune based on a specific epoch (True) or the best validation loss (False).

  • prune_epoch (int) – The specific epoch to prune by when prune_by_epoch is True.

  • **optimize_kwargs (dict) – Additional keyword arguments passed to the fit method.

Returns:

best_hparams – Best hyperparameters found during optimization.

Return type:

list

predict(X, embeddings=None, device=None)

Predicts target values for the given input samples.

Parameters:

X (DataFrame or array-like, shape (n_samples, n_features)) – The input samples for which to predict target values.

Returns:

predictions – The predicted target values.

Return type:

ndarray, shape (n_samples,) or (n_samples, n_outputs)

pretrain(pretrain_epochs=15, k_neighbors=10, temperature=0.1, save_path='pretrained_embeddings.pth', lr=0.001, use_positive=True, use_negative=False, pool_sequence=True)

Pretrains the embedding layer of the model using a contrastive learning approach.

This method performs pretraining by optimizing the embeddings with respect to neighborhood structure in the feature space. The embeddings are saved after training.

Parameters:
  • pretrain_epochs (int, default=15) – Number of epochs to run pretraining.

  • k_neighbors (int, default=10) – Number of neighbors used in the contrastive loss computation.

  • temperature (float, default=0.1) – Temperature parameter for contrastive loss scaling.

  • save_path (str, default="pretrained_embeddings.pth") – Path to save the pretrained embeddings.

  • lr (float, default=1e-3) – Learning rate for the pretraining optimizer.

  • use_positive (bool, default=True) – Whether to include positive pairs in contrastive learning.

  • use_negative (bool, default=False) – Whether to include negative pairs in contrastive learning.

  • pool_sequence (bool, default=True) – Whether to apply sequence pooling before computing contrastive loss.

Raises:
  • ValueError – If the model has not been built before calling this method.

  • ValueError – If the model does not contain an embedding layer.

Notes

  • This function requires that self.build_model() has been called beforehand.

  • The pretraining method uses self.task_model.estimator.embedding_layer.

  • The method invokes super()._pretrain() with regression mode enabled.

save(path)

Save the fitted model to path.

The bundle written by this method can be restored with load(). It contains all state required for inference: the config, the fitted preprocessor, feature metadata, and the neural-network weights.

Parameters:

path (str) – Destination file path (e.g. "model.pt").

Raises:

ValueError – If the model has not been fitted yet.

Return type:

None

score(X, y, embeddings=None, metric=sklearn.metrics.mean_squared_error)

Calculate the score of the model using the specified metric.

Parameters:
  • X (array-like or pd.DataFrame of shape (n_samples, n_features)) – The input samples to predict.

  • y (array-like of shape (n_samples,) or (n_samples, n_outputs)) – The true target values against which to evaluate the predictions.

  • metric (callable, default=mean_squared_error) – The metric function to use for evaluation. Must be a callable with the signature metric(y_true, y_pred).

Returns:

score – The score calculated using the specified metric.

Return type:

float

set_params(**parameters)

Set the parameters of this estimator.

class deeptab.models.TabMLSS(*args: Any, **kwargs: Any)[source]

TabM for distributional regressoion. This class extends the SklearnBaseLSS class and uses the TabM model with the default TabM configuration.

Notes

The parameters for this class include the attributes from the config dataclass as well as preprocessing arguments handled by the base class.

Configuration class for the TabM model with batch ensembling and predefined hyperparameters.

Parameters:
  • layer_sizes (list, default=(512, 512, 128)) – Sizes of the layers in the model.

  • activation (callable, default=nn.ReLU()) – Activation function for the model layers.

  • dropout (float, default=0.3) – Dropout rate for regularization.

  • norm (str, default=None) – Normalization method to be used, if any.

  • use_glu (bool, default=False) – Whether to use Gated Linear Units (GLU) in the model.

  • ensemble_size (int, default=32) – Number of ensemble members for batch ensembling.

  • ensemble_scaling_in (bool, default=True) – Whether to use input scaling for each ensemble member.

  • ensemble_scaling_out (bool, default=True) – Whether to use output scaling for each ensemble member.

  • ensemble_bias (bool, default=True) – Whether to use a unique bias term for each ensemble member.

  • scaling_init ({"ones", "random-signs", "normal"}, default="normal") – Initialization method for scaling weights.

  • average_ensembles (bool, default=False) – Whether to average the outputs of the ensembles.

  • model_type ({"mini", "full"}, default="mini") – Model type to use (‘mini’ for reduced version, ‘full’ for complete model).

  • feature_preprocessing (dict, optional) – Dictionary mapping feature names to specific preprocessing methods. Overrides global defaults.

  • n_bins (int, default=64) – Number of bins used for binning-based preprocessing (e.g., for discretizers or PLE).

  • numerical_preprocessing (str, default="ple") – Preprocessing method for numerical features (e.g., “standardization”, “minmax”, “ple”, “rbf”, etc.).

  • categorical_preprocessing (str, default="int") – Preprocessing method for categorical features (e.g., “int”, “ordinal”, “onehot”).

  • use_decision_tree_bins (bool, default=False) – Whether to use decision tree binning for numerical discretization.

  • binning_strategy (str, default="uniform") – Strategy for bin placement when not using tree-based methods. Options: “uniform”, “quantile”.

  • task (str, default="regression") – Problem type used to guide preprocessing (e.g., “regression” or “classification”).

  • cat_cutoff (float or int, default=0.03) – Threshold to determine whether integer-valued features are treated as categorical.

  • treat_all_integers_as_numerical (bool, default=False) – If True, treat all integer-typed columns as numerical regardless of cardinality.

  • degree (int, default=3) – Degree of polynomial or spline basis functions where applicable.

  • scaling_strategy (str, default="minmax") – Strategy for feature scaling (e.g., “standardization”, “minmax”, etc.).

  • n_knots (int, default=64) – Number of knots used in spline-based feature expansions.

  • use_decision_tree_knots (bool, default=True) – Whether to use decision tree-based knot placement for spline transformations.

  • knots_strategy (str, default="uniform") – Strategy for placing knots for splines (“uniform” or “quantile”).

  • spline_implementation (str, default="sklearn") – Which spline backend implementation to use (e.g., “sklearn”, “custom”).

  • min_unique_vals (int, default=5) – Minimum number of unique values required for a feature to be treated as numerical.

Examples

>>> from deeptab.models import TabMLSS
>>> model = TabMLSS(ensemble_size=32, model_type='full')
>>> model.fit(X_train, y_train, family='normal')
>>> preds = model.predict(X_test)
>>> model.evaluate(X_test, y_test)
build_model(X, y, val_size=0.2, X_val=None, y_val=None, random_state=101, batch_size=128, shuffle=True, lr=None, lr_patience=None, lr_factor=None, weight_decay=None, train_metrics=None, val_metrics=None, dataloader_kwargs={})

Builds the model using the provided training data.

Parameters:
  • X (DataFrame or array-like, shape (n_samples, n_features)) – The training input samples.

  • y (array-like, shape (n_samples,) or (n_samples, n_targets)) – The target values (real numbers).

  • val_size (float) – The proportion of the dataset to include in the validation split if X_val is None. Ignored if X_val is provided.

  • X_val (DataFrame or array-like, shape (n_samples, n_features), optional) – The validation input samples. If provided, X and y are not split and this data is used for validation.

  • y_val (array-like, shape (n_samples,) or (n_samples, n_targets), optional) – The validation target values. Required if X_val is provided.

  • random_state (int) – Controls the shuffling applied to the data before applying the split.

  • batch_size (int) – Number of samples per gradient update.

  • shuffle (bool) – Whether to shuffle the training data before each epoch.

  • lr (float | None) – Learning rate for the optimizer.

  • lr_patience (int | None) – Number of epochs with no improvement on the validation loss to wait before reducing the learning rate.

  • lr_factor (float | None) – Factor by which the learning rate will be reduced.

  • train_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during training.

  • val_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during validation.

  • weight_decay (float | None) – Weight decay (L2 penalty) coefficient.

  • dataloader_kwargs (dict, default={}) – The kwargs for the pytorch dataloader class.

Returns:

self – The built distributional regressor.

Return type:

object

encode(X, batch_size=64)

Encodes input data using the trained model’s embedding layer.

Parameters:
  • X (array-like or DataFrame) – Input data to be encoded.

  • batch_size (int, optional, default=64) – Batch size for encoding.

Returns:

Encoded representations of the input data.

Return type:

torch.Tensor

Raises:

ValueError – If the model or data module is not fitted.

evaluate(X, y_true, metrics=None, distribution_family=None)

Evaluate the model on the given data using specified metrics.

Parameters:
  • X (array-like or pd.DataFrame of shape (n_samples, n_features)) – The input samples to predict.

  • y_true (array-like of shape (n_samples,)) – The true class labels against which to evaluate the predictions.

  • metrics (dict) – A dictionary where keys are metric names and values are tuples containing the metric function and a boolean indicating whether the metric requires probability scores (True) or class labels (False).

  • distribution_family (str, optional) – Specifies the distribution family the model is predicting for. If None, it will attempt to infer based on the model’s settings.

Returns:

scores – A dictionary with metric names as keys and their corresponding scores as values.

Return type:

dict

Notes

This method uses either the predict or predict_proba method depending on the metric requirements.

fit(X, y, family, val_size=0.2, X_val=None, y_val=None, max_epochs=100, random_state=101, batch_size=128, shuffle=True, patience=15, monitor='val_loss', mode='min', lr=None, lr_patience=None, lr_factor=None, weight_decay=None, checkpoint_path='model_checkpoints', distributional_kwargs=None, train_metrics=None, val_metrics=None, dataloader_kwargs={}, rebuild=True, **trainer_kwargs)

Trains the regression model using the provided training data. Optionally, a separate validation set can be used.

Parameters:
  • X (DataFrame or array-like, shape (n_samples, n_features)) – The training input samples.

  • y (array-like, shape (n_samples,) or (n_samples, n_targets)) – The target values (real numbers).

  • family (str) – The name of the distribution family to use for the loss function. Examples include ‘normal’ for regression tasks.

  • val_size (float) – The proportion of the dataset to include in the validation split if X_val is None. Ignored if X_val is provided.

  • X_val (DataFrame or array-like, shape (n_samples, n_features), optional) – The validation input samples. If provided, X and y are not split and this data is used for validation.

  • y_val (array-like, shape (n_samples,) or (n_samples, n_targets), optional) – The validation target values. Required if X_val is provided.

  • max_epochs (int) – Maximum number of epochs for training.

  • random_state (int) – Controls the shuffling applied to the data before applying the split.

  • batch_size (int) – Number of samples per gradient update.

  • shuffle (bool) – Whether to shuffle the training data before each epoch.

  • patience (int) – Number of epochs with no improvement on the validation loss to wait before early stopping.

  • monitor (str) – The metric to monitor for early stopping.

  • mode (str) – Whether the monitored metric should be minimized (min) or maximized (max).

  • lr (float | None) – Learning rate for the optimizer.

  • lr_patience (int | None) – Number of epochs with no improvement on the validation loss to wait before reducing the learning rate.

  • factor (float, default=0.1) – Factor by which the learning rate will be reduced.

  • weight_decay (float | None) – Weight decay (L2 penalty) coefficient.

  • distributional_kwargs (dict, default=None) – any arguments taht are specific for a certain distribution.

  • train_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during training.

  • val_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during validation.

  • checkpoint_path (str, default="model_checkpoints") – Path where the checkpoints are being saved.

  • dataloader_kwargs (dict, default={}) – The kwargs for the pytorch dataloader class.

  • **trainer_kwargs (Additional keyword arguments for PyTorch Lightning's Trainer class.)

Returns:

self – The fitted regressor.

Return type:

object

get_default_metrics(distribution_family)

Provides default metrics based on the distribution family.

Parameters:

distribution_family (str) – The distribution family for which to provide default metrics.

Returns:

metrics – A dictionary of default metric functions.

Return type:

dict

get_number_of_params(requires_grad=True)

Calculate the number of parameters in the model.

Parameters:

requires_grad (bool, optional) – If True, only count the parameters that require gradients (trainable parameters). If False, count all parameters. Default is True.

Returns:

The total number of parameters in the model.

Return type:

int

Raises:

ValueError – If the model has not been built prior to calling this method.

get_params(deep=True)

Get parameters for this estimator.

Parameters:

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

params – Parameter names mapped to their values.

Return type:

dict

classmethod load(path)

Load and return a fitted model from path.

Parameters:

path (str) – Path to a file previously written by save().

Returns:

A fully reconstructed, ready-to-predict estimator.

Return type:

estimator

optimize_hparams(X, y, X_val=None, y_val=None, time=100, max_epochs=200, prune_by_epoch=True, prune_epoch=5, fixed_params={'cat_encoding': 'int', 'head_layer_size_length': 0, 'head_skip_layer': False, 'head_skip_layers': False, 'pooling_method': 'avg', 'use_cls': False}, custom_search_space=None, **optimize_kwargs)

Optimizes hyperparameters using Bayesian optimization with optional pruning.

Parameters:
  • X (array-like) – Training data.

  • y (array-like) – Training labels.

  • X_val (array-like, optional) – Validation data and labels.

  • y_val (array-like, optional) – Validation data and labels.

  • time (int) – The number of optimization trials to run.

  • max_epochs (int) – Maximum number of epochs for training.

  • prune_by_epoch (bool) – Whether to prune based on a specific epoch (True) or the best validation loss (False).

  • prune_epoch (int) – The specific epoch to prune by when prune_by_epoch is True.

  • **optimize_kwargs (dict) – Additional keyword arguments passed to the fit method.

Returns:

best_hparams – Best hyperparameters found during optimization.

Return type:

list

predict(X, raw=False, device=None)

Predicts target values for the given input samples.

Parameters:

X (DataFrame or array-like, shape (n_samples, n_features)) – The input samples for which to predict target values.

Returns:

predictions – The predicted target values.

Return type:

ndarray, shape (n_samples,) or (n_samples, n_outputs)

save(path)

Save the fitted model to path.

Parameters:

path (str) – Destination file path (e.g. "model.pt").

Raises:

ValueError – If the model has not been fitted yet.

Return type:

None

score(X, y, metric='NLL')

Calculate the score of the model using the specified metric.

Parameters:
  • X (array-like or pd.DataFrame of shape (n_samples, n_features)) – The input samples to predict.

  • y (array-like of shape (n_samples,) or (n_samples, n_outputs)) – The true target values against which to evaluate the predictions.

  • metric (str, default="NLL") – So far, only negative log-likelihood is supported

Returns:

score – The score calculated using the specified metric.

Return type:

float

set_params(**parameters)

Set the parameters of this estimator.

Parameters:

**parameters (dict) – Estimator parameters.

Returns:

self – Estimator instance.

Return type:

object

class deeptab.models.NODEClassifier(*args: Any, **kwargs: Any)[source]

Neural Oblivious Decision Ensemble (NODE) Classifier. Slightly different with a MLP as a tabular task specific head. This class extends the SklearnBaseClassifier class and uses the NODE model with the default NODE configuration.

Notes

The parameters for this class include the attributes from the config dataclass as well as preprocessing arguments handled by the base class.

Configuration class for the Neural Oblivious Decision Ensemble (NODE) model.

Parameters:
  • num_layers (int, default=4) – Number of dense layers in the model.

  • layer_dim (int, default=128) – Dimensionality of each dense layer.

  • tree_dim (int, default=1) – Dimensionality of the output from each tree leaf.

  • depth (int, default=6) – Depth of each decision tree in the ensemble.

  • norm (str, default=None) – Type of normalization to use in the model.

  • head_layer_sizes (list, default=()) – Sizes of the layers in the model’s head.

  • head_dropout (float, default=0.5) – Dropout rate for the head layers.

  • head_skip_layers (bool, default=False) – Whether to skip layers in the head.

  • head_activation (callable, default=nn.SELU()) – Activation function for the head layers.

  • head_use_batch_norm (bool, default=False) – Whether to use batch normalization in the head layers.

  • feature_preprocessing (dict, optional) – Dictionary mapping feature names to specific preprocessing methods. Overrides global defaults.

  • n_bins (int, default=64) – Number of bins used for binning-based preprocessing (e.g., for discretizers or PLE).

  • numerical_preprocessing (str, default="ple") – Preprocessing method for numerical features (e.g., “standardization”, “minmax”, “ple”, “rbf”, etc.).

  • categorical_preprocessing (str, default="int") – Preprocessing method for categorical features (e.g., “int”, “ordinal”, “onehot”).

  • use_decision_tree_bins (bool, default=False) – Whether to use decision tree binning for numerical discretization.

  • binning_strategy (str, default="uniform") – Strategy for bin placement when not using tree-based methods. Options: “uniform”, “quantile”.

  • task (str, default="regression") – Problem type used to guide preprocessing (e.g., “regression” or “classification”).

  • cat_cutoff (float or int, default=0.03) – Threshold to determine whether integer-valued features are treated as categorical.

  • treat_all_integers_as_numerical (bool, default=False) – If True, treat all integer-typed columns as numerical regardless of cardinality.

  • degree (int, default=3) – Degree of polynomial or spline basis functions where applicable.

  • scaling_strategy (str, default="minmax") – Strategy for feature scaling (e.g., “standardization”, “minmax”, etc.).

  • n_knots (int, default=64) – Number of knots used in spline-based feature expansions.

  • use_decision_tree_knots (bool, default=True) – Whether to use decision tree-based knot placement for spline transformations.

  • knots_strategy (str, default="uniform") – Strategy for placing knots for splines (“uniform” or “quantile”).

  • spline_implementation (str, default="sklearn") – Which spline backend implementation to use (e.g., “sklearn”, “custom”).

  • min_unique_vals (int, default=5) – Minimum number of unique values required for a feature to be treated as numerical.

Examples

>>> from deeptab.models import NODEClassifier
>>> model = NODEClassifier()
>>> model.fit(X_train, y_train)
>>> preds = model.predict(X_test)
>>> model.evaluate(X_test, y_test)
build_model(X, y, val_size=0.2, X_val=None, y_val=None, embeddings=None, embeddings_val=None, random_state=101, batch_size=128, shuffle=True, lr=None, lr_patience=None, lr_factor=None, weight_decay=None, train_metrics=None, val_metrics=None, dataloader_kwargs={})

Builds the model using the provided training data.

Parameters:
  • X (DataFrame or array-like, shape (n_samples, n_features)) – The training input samples.

  • y (array-like, shape (n_samples,) or (n_samples, n_targets)) – The target values (real numbers).

  • val_size (float) – The proportion of the dataset to include in the validation split if X_val is None. Ignored if X_val is provided.

  • X_val (DataFrame or array-like, shape (n_samples, n_features), optional) – The validation input samples. If provided, X and y are not split and this data is used for validation.

  • y_val (array-like, shape (n_samples,) or (n_samples, n_targets), optional) – The validation target values. Required if X_val is provided.

  • random_state (int) – Controls the shuffling applied to the data before applying the split.

  • batch_size (int) – Number of samples per gradient update.

  • shuffle (bool) – Whether to shuffle the training data before each epoch.

  • lr (float | None) – Learning rate for the optimizer.

  • lr_patience (int | None) – Number of epochs with no improvement on the validation loss to wait before reducing the learning rate.

  • lr_factor (float | None) – Factor by which the learning rate will be reduced.

  • train_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during training.

  • val_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during validation.

  • weight_decay (float | None) – Weight decay (L2 penalty) coefficient.

  • dataloader_kwargs (dict, default={}) – The kwargs for the pytorch dataloader class.

Returns:

self – The built classifier.

Return type:

object

encode(X, embeddings=None, batch_size=64)

Encodes input data using the trained model’s embedding layer.

Parameters:
  • X (array-like or DataFrame) – Input data to be encoded.

  • batch_size (int, optional, default=64) – Batch size for encoding.

Returns:

Encoded representations of the input data.

Return type:

torch.Tensor

Raises:

ValueError – If the model or data module is not fitted.

evaluate(X, y_true, embeddings=None, metrics=None)

Evaluate the model on the given data using specified metrics.

Parameters:
  • X (array-like or pd.DataFrame of shape (n_samples, n_features)) – The input samples to predict.

  • y_true (array-like of shape (n_samples,)) – The true class labels against which to evaluate the predictions.

  • embneddings (array-like or list of shape(n_samples, dimension)) – List or array with embeddings for unstructured data inputs

  • metrics (dict) – A dictionary where keys are metric names and values are tuples containing the metric function and a boolean indicating whether the metric requires probability scores (True) or class labels (False).

Returns:

scores – A dictionary with metric names as keys and their corresponding scores as values.

Return type:

dict

Notes

This method uses either the predict or predict_proba method depending on the metric requirements.

fit(X, y, val_size=0.2, X_val=None, y_val=None, embeddings=None, embeddings_val=None, max_epochs=100, random_state=101, batch_size=128, shuffle=True, patience=15, monitor='val_loss', mode='min', lr=None, lr_patience=None, lr_factor=None, weight_decay=None, checkpoint_path='model_checkpoints', train_metrics=None, val_metrics=None, dataloader_kwargs={}, rebuild=True, **trainer_kwargs)

Trains the classification model using the provided training data. Optionally, a separate validation set can be used.

Parameters:
  • X (DataFrame or array-like, shape (n_samples, n_features)) – The training input samples.

  • y (array-like, shape (n_samples,) or (n_samples, n_targets)) – The target values (real numbers).

  • val_size (float) – The proportion of the dataset to include in the validation split if X_val is None. Ignored if X_val is provided.

  • X_val (DataFrame or array-like, shape (n_samples, n_features), optional) – The validation input samples. If provided, X and y are not split and this data is used for validation.

  • y_val (array-like, shape (n_samples,) or (n_samples, n_targets), optional) – The validation target values. Required if X_val is provided.

  • max_epochs (int) – Maximum number of epochs for training.

  • random_state (int) – Controls the shuffling applied to the data before applying the split.

  • batch_size (int) – Number of samples per gradient update.

  • shuffle (bool) – Whether to shuffle the training data before each epoch.

  • patience (int) – Number of epochs with no improvement on the validation loss to wait before early stopping.

  • monitor (str) – The metric to monitor for early stopping.

  • mode (str) – Whether the monitored metric should be minimized (min) or maximized (max).

  • lr (float | None) – Learning rate for the optimizer.

  • lr_patience (int | None) – Number of epochs with no improvement on the validation loss to wait before reducing the learning rate.

  • factor (float, default=0.1) – Factor by which the learning rate will be reduced.

  • weight_decay (float | None) – Weight decay (L2 penalty) coefficient.

  • checkpoint_path (str, default="model_checkpoints") – Path where the checkpoints are being saved.

  • train_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during training.

  • val_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during validation.

  • dataloader_kwargs (dict, default={}) – The kwargs for the pytorch dataloader class.

  • rebuild (bool, default=True) – Whether to rebuild the model when it already was built.

  • **trainer_kwargs (Additional keyword arguments for PyTorch Lightning's Trainer class.)

Returns:

self – The fitted classifier.

Return type:

object

get_number_of_params(requires_grad=True)

Calculate the number of parameters in the model.

Parameters:

requires_grad (bool, optional) – If True, only count the parameters that require gradients (trainable parameters). If False, count all parameters. Default is True.

Returns:

The total number of parameters in the model.

Return type:

int

Raises:

ValueError – If the model has not been built prior to calling this method.

get_params(deep=True)

Get parameters for this estimator.

classmethod load(path)

Load and return a fitted model from path.

Parameters:

path (str) – Path to a file previously written by save().

Returns:

A fully reconstructed, ready-to-predict estimator of the same type that was saved.

Return type:

estimator

optimize_hparams(X, y, X_val=None, y_val=None, embeddings=None, embeddings_val=None, time=100, max_epochs=200, prune_by_epoch=True, prune_epoch=5, fixed_params={'cat_encoding': 'int', 'head_layer_size_length': 0, 'head_skip_layer': False, 'head_skip_layers': False, 'pooling_method': 'avg', 'use_cls': False}, custom_search_space=None, **optimize_kwargs)

Optimizes hyperparameters using Bayesian optimization with optional pruning.

Parameters:
  • X (array-like) – Training data.

  • y (array-like) – Training labels.

  • X_val (array-like, optional) – Validation data and labels.

  • y_val (array-like, optional) – Validation data and labels.

  • time (int) – The number of optimization trials to run.

  • max_epochs (int) – Maximum number of epochs for training.

  • prune_by_epoch (bool) – Whether to prune based on a specific epoch (True) or the best validation loss (False).

  • prune_epoch (int) – The specific epoch to prune by when prune_by_epoch is True.

  • **optimize_kwargs (dict) – Additional keyword arguments passed to the fit method.

Returns:

best_hparams – Best hyperparameters found during optimization.

Return type:

list

predict(X, embeddings=None, device=None)

Predicts target labels for the given input samples.

Parameters:

X (DataFrame or array-like, shape (n_samples, n_features)) – The input samples for which to predict target values.

Returns:

predictions – The predicted class labels.

Return type:

ndarray, shape (n_samples,)

predict_proba(X, embeddings=None, device=None)

Predicts class probabilities for the given input samples.

Parameters:

X (DataFrame or array-like, shape (n_samples, n_features)) – The input samples for which to predict class probabilities.

Returns:

probabilities – The predicted class probabilities.

Return type:

ndarray, shape (n_samples, n_classes)

pretrain(pretrain_epochs=15, k_neighbors=10, temperature=0.1, save_path='pretrained_embeddings.pth', lr=0.001, use_positive=True, use_negative=False, pool_sequence=True)

Pretrains the embedding layer of the model using a contrastive learning approach.

This method performs pretraining by optimizing the embeddings with respect to neighborhood structure in the feature space. The embeddings are saved after training.

Parameters:
  • pretrain_epochs (int, default=15) – Number of epochs to run pretraining.

  • k_neighbors (int, default=10) – Number of neighbors used in the contrastive loss computation.

  • temperature (float, default=0.1) – Temperature parameter for contrastive loss scaling.

  • save_path (str, default="pretrained_embeddings.pth") – Path to save the pretrained embeddings.

  • lr (float, default=1e-3) – Learning rate for the pretraining optimizer.

  • use_positive (bool, default=True) – Whether to include positive pairs in contrastive learning.

  • use_negative (bool, default=False) – Whether to include negative pairs in contrastive learning.

  • pool_sequence (bool, default=True) – Whether to apply sequence pooling before computing contrastive loss.

Raises:
  • ValueError – If the model has not been built before calling this method.

  • ValueError – If the model does not contain an embedding layer.

Notes

  • This function requires that self.build_model() has been called beforehand.

  • The pretraining method uses self.task_model.estimator.embedding_layer.

  • The method invokes super()._pretrain() with regression mode enabled.

save(path)

Save the fitted model to path.

The bundle written by this method can be restored with load(). It contains all state required for inference: the config, the fitted preprocessor, feature metadata, and the neural-network weights.

Parameters:

path (str) – Destination file path (e.g. "model.pt").

Raises:

ValueError – If the model has not been fitted yet.

Return type:

None

score(X, y, embeddings=None, metric=(sklearn.metrics.log_loss, True))

Calculate the score of the model using the specified metric.

Parameters:
  • X (array-like or pd.DataFrame of shape (n_samples, n_features)) – The input samples to predict.

  • y (array-like of shape (n_samples,)) – The true class labels against which to evaluate the predictions.

  • metric (tuple, default=(log_loss, True)) – A tuple containing the metric function and a boolean indicating whether the metric requires probability scores (True) or class labels (False).

Returns:

score – The score calculated using the specified metric.

Return type:

float

set_params(**parameters)

Set the parameters of this estimator.

class deeptab.models.NODERegressor(*args: Any, **kwargs: Any)[source]

Neural Oblivious Decision Ensemble (NODE) Regressor. Slightly different with a MLP as a tabular task specific head. This class extends the SklearnBaseRegressor class and uses the NODE model with the default NODE configuration.

Notes

The parameters for this class include the attributes from the config dataclass as well as preprocessing arguments handled by the base class.

Configuration class for the Neural Oblivious Decision Ensemble (NODE) model.

Parameters:
  • num_layers (int, default=4) – Number of dense layers in the model.

  • layer_dim (int, default=128) – Dimensionality of each dense layer.

  • tree_dim (int, default=1) – Dimensionality of the output from each tree leaf.

  • depth (int, default=6) – Depth of each decision tree in the ensemble.

  • norm (str, default=None) – Type of normalization to use in the model.

  • head_layer_sizes (list, default=()) – Sizes of the layers in the model’s head.

  • head_dropout (float, default=0.5) – Dropout rate for the head layers.

  • head_skip_layers (bool, default=False) – Whether to skip layers in the head.

  • head_activation (callable, default=nn.SELU()) – Activation function for the head layers.

  • head_use_batch_norm (bool, default=False) – Whether to use batch normalization in the head layers.

  • feature_preprocessing (dict, optional) – Dictionary mapping feature names to specific preprocessing methods. Overrides global defaults.

  • n_bins (int, default=64) – Number of bins used for binning-based preprocessing (e.g., for discretizers or PLE).

  • numerical_preprocessing (str, default="ple") – Preprocessing method for numerical features (e.g., “standardization”, “minmax”, “ple”, “rbf”, etc.).

  • categorical_preprocessing (str, default="int") – Preprocessing method for categorical features (e.g., “int”, “ordinal”, “onehot”).

  • use_decision_tree_bins (bool, default=False) – Whether to use decision tree binning for numerical discretization.

  • binning_strategy (str, default="uniform") – Strategy for bin placement when not using tree-based methods. Options: “uniform”, “quantile”.

  • task (str, default="regression") – Problem type used to guide preprocessing (e.g., “regression” or “classification”).

  • cat_cutoff (float or int, default=0.03) – Threshold to determine whether integer-valued features are treated as categorical.

  • treat_all_integers_as_numerical (bool, default=False) – If True, treat all integer-typed columns as numerical regardless of cardinality.

  • degree (int, default=3) – Degree of polynomial or spline basis functions where applicable.

  • scaling_strategy (str, default="minmax") – Strategy for feature scaling (e.g., “standardization”, “minmax”, etc.).

  • n_knots (int, default=64) – Number of knots used in spline-based feature expansions.

  • use_decision_tree_knots (bool, default=True) – Whether to use decision tree-based knot placement for spline transformations.

  • knots_strategy (str, default="uniform") – Strategy for placing knots for splines (“uniform” or “quantile”).

  • spline_implementation (str, default="sklearn") – Which spline backend implementation to use (e.g., “sklearn”, “custom”).

  • min_unique_vals (int, default=5) – Minimum number of unique values required for a feature to be treated as numerical.

Examples

>>> from deeptab.models import NODERegressor
>>> model = NODERegressor()
>>> model.fit(X_train, y_train)
>>> preds = model.predict(X_test)
>>> model.evaluate(X_test, y_test)
build_model(X, y, val_size=0.2, X_val=None, y_val=None, embeddings=None, embeddings_val=None, random_state=101, batch_size=128, shuffle=True, lr=None, lr_patience=None, lr_factor=None, weight_decay=None, train_metrics=None, val_metrics=None, dataloader_kwargs={})

Builds the model using the provided training data.

Parameters:
  • X (DataFrame or array-like, shape (n_samples, n_features)) – The training input samples.

  • y (array-like, shape (n_samples,) or (n_samples, n_targets)) – The target values (real numbers).

  • val_size (float) – The proportion of the dataset to include in the validation split if X_val is None. Ignored if X_val is provided.

  • X_val (DataFrame or array-like, shape (n_samples, n_features), optional) – The validation input samples. If provided, X and y are not split and this data is used for validation.

  • y_val (array-like, shape (n_samples,) or (n_samples, n_targets), optional) – The validation target values. Required if X_val is provided.

  • random_state (int) – Controls the shuffling applied to the data before applying the split.

  • batch_size (int) – Number of samples per gradient update.

  • shuffle (bool) – Whether to shuffle the training data before each epoch.

  • lr (float | None) – Learning rate for the optimizer.

  • lr_patience (int | None) – Number of epochs with no improvement on the validation loss to wait before reducing the learning rate.

  • factor (float, default=0.1) – Factor by which the learning rate will be reduced.

  • weight_decay (float | None) – Weight decay (L2 penalty) coefficient.

  • train_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during training.

  • val_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during validation.

  • dataloader_kwargs (dict, default={}) – The kwargs for the pytorch dataloader class.

Returns:

self – The built regressor.

Return type:

object

encode(X, embeddings=None, batch_size=64)

Encodes input data using the trained model’s embedding layer.

Parameters:
  • X (array-like or DataFrame) – Input data to be encoded.

  • batch_size (int, optional, default=64) – Batch size for encoding.

Returns:

Encoded representations of the input data.

Return type:

torch.Tensor

Raises:

ValueError – If the model or data module is not fitted.

evaluate(X, y_true, embeddings=None, metrics=None)

Evaluate the model on the given data using specified metrics.

Parameters:
  • X (array-like or pd.DataFrame of shape (n_samples, n_features)) – The input samples to predict.

  • y_true (array-like of shape (n_samples,) or (n_samples, n_outputs)) – The true target values against which to evaluate the predictions.

  • metrics (dict) – A dictionary where keys are metric names and values are the metric functions.

Notes

This method uses the predict method to generate predictions and computes each metric.

Returns:

scores – A dictionary with metric names as keys and their corresponding scores as values.

Return type:

dict

fit(X, y, val_size=0.2, X_val=None, y_val=None, embeddings=None, embeddings_val=None, max_epochs=100, random_state=101, batch_size=128, shuffle=True, patience=15, monitor='val_loss', mode='min', lr=None, lr_patience=None, lr_factor=None, weight_decay=None, checkpoint_path='model_checkpoints', dataloader_kwargs={}, train_metrics=None, val_metrics=None, rebuild=True, **trainer_kwargs)

Trains the regression model using the provided training data. Optionally, a separate validation set can be used.

Parameters:
  • X (DataFrame or array-like, shape (n_samples, n_features)) – The training input samples.

  • y (array-like, shape (n_samples,) or (n_samples, n_targets)) – The target values (real numbers).

  • val_size (float) – The proportion of the dataset to include in the validation split if X_val is None. Ignored if X_val is provided.

  • X_val (DataFrame or array-like, shape (n_samples, n_features), optional) – The validation input samples. If provided, X and y are not split and this data is used for validation.

  • y_val (array-like, shape (n_samples,) or (n_samples, n_targets), optional) – The validation target values. Required if X_val is provided.

  • max_epochs (int) – Maximum number of epochs for training.

  • random_state (int) – Controls the shuffling applied to the data before applying the split.

  • batch_size (int) – Number of samples per gradient update.

  • shuffle (bool) – Whether to shuffle the training data before each epoch.

  • patience (int) – Number of epochs with no improvement on the validation loss to wait before early stopping.

  • monitor (str) – The metric to monitor for early stopping.

  • mode (str) – Whether the monitored metric should be minimized (min) or maximized (max).

  • lr (float | None) – Learning rate for the optimizer.

  • lr_patience (int | None) – Number of epochs with no improvement on the validation loss to wait before reducing the learning rate.

  • factor (float, default=0.1) – Factor by which the learning rate will be reduced.

  • weight_decay (float | None) – Weight decay (L2 penalty) coefficient.

  • checkpoint_path (str, default="model_checkpoints") – Path where the checkpoints are being saved.

  • dataloader_kwargs (dict, default={}) – The kwargs for the pytorch dataloader class.

  • train_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during training.

  • val_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during validation.

  • rebuild (bool, default=True) – Whether to rebuild the model when it already was built.

  • **trainer_kwargs (Additional keyword arguments for PyTorch Lightning's Trainer class.)

Returns:

self – The fitted regressor.

Return type:

object

get_number_of_params(requires_grad=True)

Calculate the number of parameters in the model.

Parameters:

requires_grad (bool, optional) – If True, only count the parameters that require gradients (trainable parameters). If False, count all parameters. Default is True.

Returns:

The total number of parameters in the model.

Return type:

int

Raises:

ValueError – If the model has not been built prior to calling this method.

get_params(deep=True)

Get parameters for this estimator.

classmethod load(path)

Load and return a fitted model from path.

Parameters:

path (str) – Path to a file previously written by save().

Returns:

A fully reconstructed, ready-to-predict estimator of the same type that was saved.

Return type:

estimator

optimize_hparams(X, y, X_val=None, y_val=None, embeddings=None, embeddings_val=None, time=100, max_epochs=200, prune_by_epoch=True, prune_epoch=5, fixed_params={'cat_encoding': 'int', 'head_layer_size_length': 0, 'head_skip_layer': False, 'head_skip_layers': False, 'pooling_method': 'avg', 'use_cls': False}, custom_search_space=None, **optimize_kwargs)

Optimizes hyperparameters using Bayesian optimization with optional pruning.

Parameters:
  • X (array-like) – Training data.

  • y (array-like) – Training labels.

  • X_val (array-like, optional) – Validation data and labels.

  • y_val (array-like, optional) – Validation data and labels.

  • time (int) – The number of optimization trials to run.

  • max_epochs (int) – Maximum number of epochs for training.

  • prune_by_epoch (bool) – Whether to prune based on a specific epoch (True) or the best validation loss (False).

  • prune_epoch (int) – The specific epoch to prune by when prune_by_epoch is True.

  • **optimize_kwargs (dict) – Additional keyword arguments passed to the fit method.

Returns:

best_hparams – Best hyperparameters found during optimization.

Return type:

list

predict(X, embeddings=None, device=None)

Predicts target values for the given input samples.

Parameters:

X (DataFrame or array-like, shape (n_samples, n_features)) – The input samples for which to predict target values.

Returns:

predictions – The predicted target values.

Return type:

ndarray, shape (n_samples,) or (n_samples, n_outputs)

pretrain(pretrain_epochs=15, k_neighbors=10, temperature=0.1, save_path='pretrained_embeddings.pth', lr=0.001, use_positive=True, use_negative=False, pool_sequence=True)

Pretrains the embedding layer of the model using a contrastive learning approach.

This method performs pretraining by optimizing the embeddings with respect to neighborhood structure in the feature space. The embeddings are saved after training.

Parameters:
  • pretrain_epochs (int, default=15) – Number of epochs to run pretraining.

  • k_neighbors (int, default=10) – Number of neighbors used in the contrastive loss computation.

  • temperature (float, default=0.1) – Temperature parameter for contrastive loss scaling.

  • save_path (str, default="pretrained_embeddings.pth") – Path to save the pretrained embeddings.

  • lr (float, default=1e-3) – Learning rate for the pretraining optimizer.

  • use_positive (bool, default=True) – Whether to include positive pairs in contrastive learning.

  • use_negative (bool, default=False) – Whether to include negative pairs in contrastive learning.

  • pool_sequence (bool, default=True) – Whether to apply sequence pooling before computing contrastive loss.

Raises:
  • ValueError – If the model has not been built before calling this method.

  • ValueError – If the model does not contain an embedding layer.

Notes

  • This function requires that self.build_model() has been called beforehand.

  • The pretraining method uses self.task_model.estimator.embedding_layer.

  • The method invokes super()._pretrain() with regression mode enabled.

save(path)

Save the fitted model to path.

The bundle written by this method can be restored with load(). It contains all state required for inference: the config, the fitted preprocessor, feature metadata, and the neural-network weights.

Parameters:

path (str) – Destination file path (e.g. "model.pt").

Raises:

ValueError – If the model has not been fitted yet.

Return type:

None

score(X, y, embeddings=None, metric=sklearn.metrics.mean_squared_error)

Calculate the score of the model using the specified metric.

Parameters:
  • X (array-like or pd.DataFrame of shape (n_samples, n_features)) – The input samples to predict.

  • y (array-like of shape (n_samples,) or (n_samples, n_outputs)) – The true target values against which to evaluate the predictions.

  • metric (callable, default=mean_squared_error) – The metric function to use for evaluation. Must be a callable with the signature metric(y_true, y_pred).

Returns:

score – The score calculated using the specified metric.

Return type:

float

set_params(**parameters)

Set the parameters of this estimator.

class deeptab.models.NODELSS(*args: Any, **kwargs: Any)[source]

Neural Oblivious Decision Ensemble (NODE) for distributional regression. Slightly different with a MLP as a tabular task specific head. This class extends the SklearnBaseLSS class and uses the NODE model with the default NODE configuration.

Notes

The parameters for this class include the attributes from the config dataclass as well as preprocessing arguments handled by the base class.

Configuration class for the Neural Oblivious Decision Ensemble (NODE) model.

Parameters:
  • num_layers (int, default=4) – Number of dense layers in the model.

  • layer_dim (int, default=128) – Dimensionality of each dense layer.

  • tree_dim (int, default=1) – Dimensionality of the output from each tree leaf.

  • depth (int, default=6) – Depth of each decision tree in the ensemble.

  • norm (str, default=None) – Type of normalization to use in the model.

  • head_layer_sizes (list, default=()) – Sizes of the layers in the model’s head.

  • head_dropout (float, default=0.5) – Dropout rate for the head layers.

  • head_skip_layers (bool, default=False) – Whether to skip layers in the head.

  • head_activation (callable, default=nn.SELU()) – Activation function for the head layers.

  • head_use_batch_norm (bool, default=False) – Whether to use batch normalization in the head layers.

  • feature_preprocessing (dict, optional) – Dictionary mapping feature names to specific preprocessing methods. Overrides global defaults.

  • n_bins (int, default=64) – Number of bins used for binning-based preprocessing (e.g., for discretizers or PLE).

  • numerical_preprocessing (str, default="ple") – Preprocessing method for numerical features (e.g., “standardization”, “minmax”, “ple”, “rbf”, etc.).

  • categorical_preprocessing (str, default="int") – Preprocessing method for categorical features (e.g., “int”, “ordinal”, “onehot”).

  • use_decision_tree_bins (bool, default=False) – Whether to use decision tree binning for numerical discretization.

  • binning_strategy (str, default="uniform") – Strategy for bin placement when not using tree-based methods. Options: “uniform”, “quantile”.

  • task (str, default="regression") – Problem type used to guide preprocessing (e.g., “regression” or “classification”).

  • cat_cutoff (float or int, default=0.03) – Threshold to determine whether integer-valued features are treated as categorical.

  • treat_all_integers_as_numerical (bool, default=False) – If True, treat all integer-typed columns as numerical regardless of cardinality.

  • degree (int, default=3) – Degree of polynomial or spline basis functions where applicable.

  • scaling_strategy (str, default="minmax") – Strategy for feature scaling (e.g., “standardization”, “minmax”, etc.).

  • n_knots (int, default=64) – Number of knots used in spline-based feature expansions.

  • use_decision_tree_knots (bool, default=True) – Whether to use decision tree-based knot placement for spline transformations.

  • knots_strategy (str, default="uniform") – Strategy for placing knots for splines (“uniform” or “quantile”).

  • spline_implementation (str, default="sklearn") – Which spline backend implementation to use (e.g., “sklearn”, “custom”).

  • min_unique_vals (int, default=5) – Minimum number of unique values required for a feature to be treated as numerical.

Examples

>>> from deeptab.models import NODELSS
>>> model = NODELSS()
>>> model.fit(X_train, y_train, family='normal')
>>> preds = model.predict(X_test)
>>> model.evaluate(X_test, y_test)
build_model(X, y, val_size=0.2, X_val=None, y_val=None, random_state=101, batch_size=128, shuffle=True, lr=None, lr_patience=None, lr_factor=None, weight_decay=None, train_metrics=None, val_metrics=None, dataloader_kwargs={})

Builds the model using the provided training data.

Parameters:
  • X (DataFrame or array-like, shape (n_samples, n_features)) – The training input samples.

  • y (array-like, shape (n_samples,) or (n_samples, n_targets)) – The target values (real numbers).

  • val_size (float) – The proportion of the dataset to include in the validation split if X_val is None. Ignored if X_val is provided.

  • X_val (DataFrame or array-like, shape (n_samples, n_features), optional) – The validation input samples. If provided, X and y are not split and this data is used for validation.

  • y_val (array-like, shape (n_samples,) or (n_samples, n_targets), optional) – The validation target values. Required if X_val is provided.

  • random_state (int) – Controls the shuffling applied to the data before applying the split.

  • batch_size (int) – Number of samples per gradient update.

  • shuffle (bool) – Whether to shuffle the training data before each epoch.

  • lr (float | None) – Learning rate for the optimizer.

  • lr_patience (int | None) – Number of epochs with no improvement on the validation loss to wait before reducing the learning rate.

  • lr_factor (float | None) – Factor by which the learning rate will be reduced.

  • train_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during training.

  • val_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during validation.

  • weight_decay (float | None) – Weight decay (L2 penalty) coefficient.

  • dataloader_kwargs (dict, default={}) – The kwargs for the pytorch dataloader class.

Returns:

self – The built distributional regressor.

Return type:

object

encode(X, batch_size=64)

Encodes input data using the trained model’s embedding layer.

Parameters:
  • X (array-like or DataFrame) – Input data to be encoded.

  • batch_size (int, optional, default=64) – Batch size for encoding.

Returns:

Encoded representations of the input data.

Return type:

torch.Tensor

Raises:

ValueError – If the model or data module is not fitted.

evaluate(X, y_true, metrics=None, distribution_family=None)

Evaluate the model on the given data using specified metrics.

Parameters:
  • X (array-like or pd.DataFrame of shape (n_samples, n_features)) – The input samples to predict.

  • y_true (array-like of shape (n_samples,)) – The true class labels against which to evaluate the predictions.

  • metrics (dict) – A dictionary where keys are metric names and values are tuples containing the metric function and a boolean indicating whether the metric requires probability scores (True) or class labels (False).

  • distribution_family (str, optional) – Specifies the distribution family the model is predicting for. If None, it will attempt to infer based on the model’s settings.

Returns:

scores – A dictionary with metric names as keys and their corresponding scores as values.

Return type:

dict

Notes

This method uses either the predict or predict_proba method depending on the metric requirements.

fit(X, y, family, val_size=0.2, X_val=None, y_val=None, max_epochs=100, random_state=101, batch_size=128, shuffle=True, patience=15, monitor='val_loss', mode='min', lr=None, lr_patience=None, lr_factor=None, weight_decay=None, checkpoint_path='model_checkpoints', distributional_kwargs=None, train_metrics=None, val_metrics=None, dataloader_kwargs={}, rebuild=True, **trainer_kwargs)

Trains the regression model using the provided training data. Optionally, a separate validation set can be used.

Parameters:
  • X (DataFrame or array-like, shape (n_samples, n_features)) – The training input samples.

  • y (array-like, shape (n_samples,) or (n_samples, n_targets)) – The target values (real numbers).

  • family (str) – The name of the distribution family to use for the loss function. Examples include ‘normal’ for regression tasks.

  • val_size (float) – The proportion of the dataset to include in the validation split if X_val is None. Ignored if X_val is provided.

  • X_val (DataFrame or array-like, shape (n_samples, n_features), optional) – The validation input samples. If provided, X and y are not split and this data is used for validation.

  • y_val (array-like, shape (n_samples,) or (n_samples, n_targets), optional) – The validation target values. Required if X_val is provided.

  • max_epochs (int) – Maximum number of epochs for training.

  • random_state (int) – Controls the shuffling applied to the data before applying the split.

  • batch_size (int) – Number of samples per gradient update.

  • shuffle (bool) – Whether to shuffle the training data before each epoch.

  • patience (int) – Number of epochs with no improvement on the validation loss to wait before early stopping.

  • monitor (str) – The metric to monitor for early stopping.

  • mode (str) – Whether the monitored metric should be minimized (min) or maximized (max).

  • lr (float | None) – Learning rate for the optimizer.

  • lr_patience (int | None) – Number of epochs with no improvement on the validation loss to wait before reducing the learning rate.

  • factor (float, default=0.1) – Factor by which the learning rate will be reduced.

  • weight_decay (float | None) – Weight decay (L2 penalty) coefficient.

  • distributional_kwargs (dict, default=None) – any arguments taht are specific for a certain distribution.

  • train_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during training.

  • val_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during validation.

  • checkpoint_path (str, default="model_checkpoints") – Path where the checkpoints are being saved.

  • dataloader_kwargs (dict, default={}) – The kwargs for the pytorch dataloader class.

  • **trainer_kwargs (Additional keyword arguments for PyTorch Lightning's Trainer class.)

Returns:

self – The fitted regressor.

Return type:

object

get_default_metrics(distribution_family)

Provides default metrics based on the distribution family.

Parameters:

distribution_family (str) – The distribution family for which to provide default metrics.

Returns:

metrics – A dictionary of default metric functions.

Return type:

dict

get_number_of_params(requires_grad=True)

Calculate the number of parameters in the model.

Parameters:

requires_grad (bool, optional) – If True, only count the parameters that require gradients (trainable parameters). If False, count all parameters. Default is True.

Returns:

The total number of parameters in the model.

Return type:

int

Raises:

ValueError – If the model has not been built prior to calling this method.

get_params(deep=True)

Get parameters for this estimator.

Parameters:

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

params – Parameter names mapped to their values.

Return type:

dict

classmethod load(path)

Load and return a fitted model from path.

Parameters:

path (str) – Path to a file previously written by save().

Returns:

A fully reconstructed, ready-to-predict estimator.

Return type:

estimator

optimize_hparams(X, y, X_val=None, y_val=None, time=100, max_epochs=200, prune_by_epoch=True, prune_epoch=5, fixed_params={'cat_encoding': 'int', 'head_layer_size_length': 0, 'head_skip_layer': False, 'head_skip_layers': False, 'pooling_method': 'avg', 'use_cls': False}, custom_search_space=None, **optimize_kwargs)

Optimizes hyperparameters using Bayesian optimization with optional pruning.

Parameters:
  • X (array-like) – Training data.

  • y (array-like) – Training labels.

  • X_val (array-like, optional) – Validation data and labels.

  • y_val (array-like, optional) – Validation data and labels.

  • time (int) – The number of optimization trials to run.

  • max_epochs (int) – Maximum number of epochs for training.

  • prune_by_epoch (bool) – Whether to prune based on a specific epoch (True) or the best validation loss (False).

  • prune_epoch (int) – The specific epoch to prune by when prune_by_epoch is True.

  • **optimize_kwargs (dict) – Additional keyword arguments passed to the fit method.

Returns:

best_hparams – Best hyperparameters found during optimization.

Return type:

list

predict(X, raw=False, device=None)

Predicts target values for the given input samples.

Parameters:

X (DataFrame or array-like, shape (n_samples, n_features)) – The input samples for which to predict target values.

Returns:

predictions – The predicted target values.

Return type:

ndarray, shape (n_samples,) or (n_samples, n_outputs)

save(path)

Save the fitted model to path.

Parameters:

path (str) – Destination file path (e.g. "model.pt").

Raises:

ValueError – If the model has not been fitted yet.

Return type:

None

score(X, y, metric='NLL')

Calculate the score of the model using the specified metric.

Parameters:
  • X (array-like or pd.DataFrame of shape (n_samples, n_features)) – The input samples to predict.

  • y (array-like of shape (n_samples,) or (n_samples, n_outputs)) – The true target values against which to evaluate the predictions.

  • metric (str, default="NLL") – So far, only negative log-likelihood is supported

Returns:

score – The score calculated using the specified metric.

Return type:

float

set_params(**parameters)

Set the parameters of this estimator.

Parameters:

**parameters (dict) – Estimator parameters.

Returns:

self – Estimator instance.

Return type:

object

class deeptab.models.NDTFClassifier(*args: Any, **kwargs: Any)[source]

Neural Decision Forest classifier. This class extends the SklearnBaseClassifier class and uses the NDTF model with the default NDTF configuration.

Notes

The parameters for this class include the attributes from the config dataclass as well as preprocessing arguments handled by the base class.

Configuration class for the default Neural Decision Tree Forest (NDTF) model with predefined hyperparameters.

Parameters:
  • min_depth (int, default=2) – Minimum depth of trees in the forest. Controls the simplest model structure.

  • max_depth (int, default=10) – Maximum depth of trees in the forest. Controls the maximum complexity of the trees.

  • temperature (float, default=0.1) – Temperature parameter for softening the node decisions during path probability calculation.

  • node_sampling (float, default=0.3) – Fraction of nodes sampled for regularization penalty calculation. Reduces computation by focusing on a subset of nodes.

  • lamda (float, default=0.3) – Regularization parameter to control the complexity of the paths, penalizing overconfident or imbalanced paths.

  • n_ensembles (int, default=12) – Number of trees in the forest

  • penalty_factor (float, default=0.01) – Factor with which the penalty is multiplied

  • feature_preprocessing (dict, optional) – Dictionary mapping feature names to specific preprocessing methods. Overrides global defaults.

  • n_bins (int, default=64) – Number of bins used for binning-based preprocessing (e.g., for discretizers or PLE).

  • numerical_preprocessing (str, default="ple") – Preprocessing method for numerical features (e.g., “standardization”, “minmax”, “ple”, “rbf”, etc.).

  • categorical_preprocessing (str, default="int") – Preprocessing method for categorical features (e.g., “int”, “ordinal”, “onehot”).

  • use_decision_tree_bins (bool, default=False) – Whether to use decision tree binning for numerical discretization.

  • binning_strategy (str, default="uniform") – Strategy for bin placement when not using tree-based methods. Options: “uniform”, “quantile”.

  • task (str, default="regression") – Problem type used to guide preprocessing (e.g., “regression” or “classification”).

  • cat_cutoff (float or int, default=0.03) – Threshold to determine whether integer-valued features are treated as categorical.

  • treat_all_integers_as_numerical (bool, default=False) – If True, treat all integer-typed columns as numerical regardless of cardinality.

  • degree (int, default=3) – Degree of polynomial or spline basis functions where applicable.

  • scaling_strategy (str, default="minmax") – Strategy for feature scaling (e.g., “standardization”, “minmax”, etc.).

  • n_knots (int, default=64) – Number of knots used in spline-based feature expansions.

  • use_decision_tree_knots (bool, default=True) – Whether to use decision tree-based knot placement for spline transformations.

  • knots_strategy (str, default="uniform") – Strategy for placing knots for splines (“uniform” or “quantile”).

  • spline_implementation (str, default="sklearn") – Which spline backend implementation to use (e.g., “sklearn”, “custom”).

  • min_unique_vals (int, default=5) – Minimum number of unique values required for a feature to be treated as numerical.

Examples

>>> from deeptab.models import NDTFClassifier
>>> model = NDTFClassifier(n_ensembles=12, max_depth=8)
>>> model.fit(X_train, y_train)
>>> preds = model.predict(X_test)
>>> model.evaluate(X_test, y_test)
build_model(X, y, val_size=0.2, X_val=None, y_val=None, embeddings=None, embeddings_val=None, random_state=101, batch_size=128, shuffle=True, lr=None, lr_patience=None, lr_factor=None, weight_decay=None, train_metrics=None, val_metrics=None, dataloader_kwargs={})

Builds the model using the provided training data.

Parameters:
  • X (DataFrame or array-like, shape (n_samples, n_features)) – The training input samples.

  • y (array-like, shape (n_samples,) or (n_samples, n_targets)) – The target values (real numbers).

  • val_size (float) – The proportion of the dataset to include in the validation split if X_val is None. Ignored if X_val is provided.

  • X_val (DataFrame or array-like, shape (n_samples, n_features), optional) – The validation input samples. If provided, X and y are not split and this data is used for validation.

  • y_val (array-like, shape (n_samples,) or (n_samples, n_targets), optional) – The validation target values. Required if X_val is provided.

  • random_state (int) – Controls the shuffling applied to the data before applying the split.

  • batch_size (int) – Number of samples per gradient update.

  • shuffle (bool) – Whether to shuffle the training data before each epoch.

  • lr (float | None) – Learning rate for the optimizer.

  • lr_patience (int | None) – Number of epochs with no improvement on the validation loss to wait before reducing the learning rate.

  • lr_factor (float | None) – Factor by which the learning rate will be reduced.

  • train_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during training.

  • val_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during validation.

  • weight_decay (float | None) – Weight decay (L2 penalty) coefficient.

  • dataloader_kwargs (dict, default={}) – The kwargs for the pytorch dataloader class.

Returns:

self – The built classifier.

Return type:

object

encode(X, embeddings=None, batch_size=64)

Encodes input data using the trained model’s embedding layer.

Parameters:
  • X (array-like or DataFrame) – Input data to be encoded.

  • batch_size (int, optional, default=64) – Batch size for encoding.

Returns:

Encoded representations of the input data.

Return type:

torch.Tensor

Raises:

ValueError – If the model or data module is not fitted.

evaluate(X, y_true, embeddings=None, metrics=None)

Evaluate the model on the given data using specified metrics.

Parameters:
  • X (array-like or pd.DataFrame of shape (n_samples, n_features)) – The input samples to predict.

  • y_true (array-like of shape (n_samples,)) – The true class labels against which to evaluate the predictions.

  • embneddings (array-like or list of shape(n_samples, dimension)) – List or array with embeddings for unstructured data inputs

  • metrics (dict) – A dictionary where keys are metric names and values are tuples containing the metric function and a boolean indicating whether the metric requires probability scores (True) or class labels (False).

Returns:

scores – A dictionary with metric names as keys and their corresponding scores as values.

Return type:

dict

Notes

This method uses either the predict or predict_proba method depending on the metric requirements.

fit(X, y, val_size=0.2, X_val=None, y_val=None, embeddings=None, embeddings_val=None, max_epochs=100, random_state=101, batch_size=128, shuffle=True, patience=15, monitor='val_loss', mode='min', lr=None, lr_patience=None, lr_factor=None, weight_decay=None, checkpoint_path='model_checkpoints', train_metrics=None, val_metrics=None, dataloader_kwargs={}, rebuild=True, **trainer_kwargs)

Trains the classification model using the provided training data. Optionally, a separate validation set can be used.

Parameters:
  • X (DataFrame or array-like, shape (n_samples, n_features)) – The training input samples.

  • y (array-like, shape (n_samples,) or (n_samples, n_targets)) – The target values (real numbers).

  • val_size (float) – The proportion of the dataset to include in the validation split if X_val is None. Ignored if X_val is provided.

  • X_val (DataFrame or array-like, shape (n_samples, n_features), optional) – The validation input samples. If provided, X and y are not split and this data is used for validation.

  • y_val (array-like, shape (n_samples,) or (n_samples, n_targets), optional) – The validation target values. Required if X_val is provided.

  • max_epochs (int) – Maximum number of epochs for training.

  • random_state (int) – Controls the shuffling applied to the data before applying the split.

  • batch_size (int) – Number of samples per gradient update.

  • shuffle (bool) – Whether to shuffle the training data before each epoch.

  • patience (int) – Number of epochs with no improvement on the validation loss to wait before early stopping.

  • monitor (str) – The metric to monitor for early stopping.

  • mode (str) – Whether the monitored metric should be minimized (min) or maximized (max).

  • lr (float | None) – Learning rate for the optimizer.

  • lr_patience (int | None) – Number of epochs with no improvement on the validation loss to wait before reducing the learning rate.

  • factor (float, default=0.1) – Factor by which the learning rate will be reduced.

  • weight_decay (float | None) – Weight decay (L2 penalty) coefficient.

  • checkpoint_path (str, default="model_checkpoints") – Path where the checkpoints are being saved.

  • train_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during training.

  • val_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during validation.

  • dataloader_kwargs (dict, default={}) – The kwargs for the pytorch dataloader class.

  • rebuild (bool, default=True) – Whether to rebuild the model when it already was built.

  • **trainer_kwargs (Additional keyword arguments for PyTorch Lightning's Trainer class.)

Returns:

self – The fitted classifier.

Return type:

object

get_number_of_params(requires_grad=True)

Calculate the number of parameters in the model.

Parameters:

requires_grad (bool, optional) – If True, only count the parameters that require gradients (trainable parameters). If False, count all parameters. Default is True.

Returns:

The total number of parameters in the model.

Return type:

int

Raises:

ValueError – If the model has not been built prior to calling this method.

get_params(deep=True)

Get parameters for this estimator.

classmethod load(path)

Load and return a fitted model from path.

Parameters:

path (str) – Path to a file previously written by save().

Returns:

A fully reconstructed, ready-to-predict estimator of the same type that was saved.

Return type:

estimator

optimize_hparams(X, y, X_val=None, y_val=None, embeddings=None, embeddings_val=None, time=100, max_epochs=200, prune_by_epoch=True, prune_epoch=5, fixed_params={'cat_encoding': 'int', 'head_layer_size_length': 0, 'head_skip_layer': False, 'head_skip_layers': False, 'pooling_method': 'avg', 'use_cls': False}, custom_search_space=None, **optimize_kwargs)

Optimizes hyperparameters using Bayesian optimization with optional pruning.

Parameters:
  • X (array-like) – Training data.

  • y (array-like) – Training labels.

  • X_val (array-like, optional) – Validation data and labels.

  • y_val (array-like, optional) – Validation data and labels.

  • time (int) – The number of optimization trials to run.

  • max_epochs (int) – Maximum number of epochs for training.

  • prune_by_epoch (bool) – Whether to prune based on a specific epoch (True) or the best validation loss (False).

  • prune_epoch (int) – The specific epoch to prune by when prune_by_epoch is True.

  • **optimize_kwargs (dict) – Additional keyword arguments passed to the fit method.

Returns:

best_hparams – Best hyperparameters found during optimization.

Return type:

list

predict(X, embeddings=None, device=None)

Predicts target labels for the given input samples.

Parameters:

X (DataFrame or array-like, shape (n_samples, n_features)) – The input samples for which to predict target values.

Returns:

predictions – The predicted class labels.

Return type:

ndarray, shape (n_samples,)

predict_proba(X, embeddings=None, device=None)

Predicts class probabilities for the given input samples.

Parameters:

X (DataFrame or array-like, shape (n_samples, n_features)) – The input samples for which to predict class probabilities.

Returns:

probabilities – The predicted class probabilities.

Return type:

ndarray, shape (n_samples, n_classes)

pretrain(pretrain_epochs=15, k_neighbors=10, temperature=0.1, save_path='pretrained_embeddings.pth', lr=0.001, use_positive=True, use_negative=False, pool_sequence=True)

Pretrains the embedding layer of the model using a contrastive learning approach.

This method performs pretraining by optimizing the embeddings with respect to neighborhood structure in the feature space. The embeddings are saved after training.

Parameters:
  • pretrain_epochs (int, default=15) – Number of epochs to run pretraining.

  • k_neighbors (int, default=10) – Number of neighbors used in the contrastive loss computation.

  • temperature (float, default=0.1) – Temperature parameter for contrastive loss scaling.

  • save_path (str, default="pretrained_embeddings.pth") – Path to save the pretrained embeddings.

  • lr (float, default=1e-3) – Learning rate for the pretraining optimizer.

  • use_positive (bool, default=True) – Whether to include positive pairs in contrastive learning.

  • use_negative (bool, default=False) – Whether to include negative pairs in contrastive learning.

  • pool_sequence (bool, default=True) – Whether to apply sequence pooling before computing contrastive loss.

Raises:
  • ValueError – If the model has not been built before calling this method.

  • ValueError – If the model does not contain an embedding layer.

Notes

  • This function requires that self.build_model() has been called beforehand.

  • The pretraining method uses self.task_model.estimator.embedding_layer.

  • The method invokes super()._pretrain() with regression mode enabled.

save(path)

Save the fitted model to path.

The bundle written by this method can be restored with load(). It contains all state required for inference: the config, the fitted preprocessor, feature metadata, and the neural-network weights.

Parameters:

path (str) – Destination file path (e.g. "model.pt").

Raises:

ValueError – If the model has not been fitted yet.

Return type:

None

score(X, y, embeddings=None, metric=(sklearn.metrics.log_loss, True))

Calculate the score of the model using the specified metric.

Parameters:
  • X (array-like or pd.DataFrame of shape (n_samples, n_features)) – The input samples to predict.

  • y (array-like of shape (n_samples,)) – The true class labels against which to evaluate the predictions.

  • metric (tuple, default=(log_loss, True)) – A tuple containing the metric function and a boolean indicating whether the metric requires probability scores (True) or class labels (False).

Returns:

score – The score calculated using the specified metric.

Return type:

float

set_params(**parameters)

Set the parameters of this estimator.

class deeptab.models.NDTFRegressor(*args: Any, **kwargs: Any)[source]

Neural Decision Forest regressor. This class extends the SklearnBaseRegressor class and uses the NDTF model with the default NDTF configuration.

Notes

The parameters for this class include the attributes from the config dataclass as well as preprocessing arguments handled by the base class.

Configuration class for the default Neural Decision Tree Forest (NDTF) model with predefined hyperparameters.

Parameters:
  • min_depth (int, default=2) – Minimum depth of trees in the forest. Controls the simplest model structure.

  • max_depth (int, default=10) – Maximum depth of trees in the forest. Controls the maximum complexity of the trees.

  • temperature (float, default=0.1) – Temperature parameter for softening the node decisions during path probability calculation.

  • node_sampling (float, default=0.3) – Fraction of nodes sampled for regularization penalty calculation. Reduces computation by focusing on a subset of nodes.

  • lamda (float, default=0.3) – Regularization parameter to control the complexity of the paths, penalizing overconfident or imbalanced paths.

  • n_ensembles (int, default=12) – Number of trees in the forest

  • penalty_factor (float, default=0.01) – Factor with which the penalty is multiplied

  • feature_preprocessing (dict, optional) – Dictionary mapping feature names to specific preprocessing methods. Overrides global defaults.

  • n_bins (int, default=64) – Number of bins used for binning-based preprocessing (e.g., for discretizers or PLE).

  • numerical_preprocessing (str, default="ple") – Preprocessing method for numerical features (e.g., “standardization”, “minmax”, “ple”, “rbf”, etc.).

  • categorical_preprocessing (str, default="int") – Preprocessing method for categorical features (e.g., “int”, “ordinal”, “onehot”).

  • use_decision_tree_bins (bool, default=False) – Whether to use decision tree binning for numerical discretization.

  • binning_strategy (str, default="uniform") – Strategy for bin placement when not using tree-based methods. Options: “uniform”, “quantile”.

  • task (str, default="regression") – Problem type used to guide preprocessing (e.g., “regression” or “classification”).

  • cat_cutoff (float or int, default=0.03) – Threshold to determine whether integer-valued features are treated as categorical.

  • treat_all_integers_as_numerical (bool, default=False) – If True, treat all integer-typed columns as numerical regardless of cardinality.

  • degree (int, default=3) – Degree of polynomial or spline basis functions where applicable.

  • scaling_strategy (str, default="minmax") – Strategy for feature scaling (e.g., “standardization”, “minmax”, etc.).

  • n_knots (int, default=64) – Number of knots used in spline-based feature expansions.

  • use_decision_tree_knots (bool, default=True) – Whether to use decision tree-based knot placement for spline transformations.

  • knots_strategy (str, default="uniform") – Strategy for placing knots for splines (“uniform” or “quantile”).

  • spline_implementation (str, default="sklearn") – Which spline backend implementation to use (e.g., “sklearn”, “custom”).

  • min_unique_vals (int, default=5) – Minimum number of unique values required for a feature to be treated as numerical.

Examples

>>> from deeptab.models import NDTFRegressor
>>> model = NDTFRegressor(n_ensembles=12, max_depth=8)
>>> model.fit(X_train, y_train)
>>> preds = model.predict(X_test)
>>> model.evaluate(X_test, y_test)
build_model(X, y, val_size=0.2, X_val=None, y_val=None, embeddings=None, embeddings_val=None, random_state=101, batch_size=128, shuffle=True, lr=None, lr_patience=None, lr_factor=None, weight_decay=None, train_metrics=None, val_metrics=None, dataloader_kwargs={})

Builds the model using the provided training data.

Parameters:
  • X (DataFrame or array-like, shape (n_samples, n_features)) – The training input samples.

  • y (array-like, shape (n_samples,) or (n_samples, n_targets)) – The target values (real numbers).

  • val_size (float) – The proportion of the dataset to include in the validation split if X_val is None. Ignored if X_val is provided.

  • X_val (DataFrame or array-like, shape (n_samples, n_features), optional) – The validation input samples. If provided, X and y are not split and this data is used for validation.

  • y_val (array-like, shape (n_samples,) or (n_samples, n_targets), optional) – The validation target values. Required if X_val is provided.

  • random_state (int) – Controls the shuffling applied to the data before applying the split.

  • batch_size (int) – Number of samples per gradient update.

  • shuffle (bool) – Whether to shuffle the training data before each epoch.

  • lr (float | None) – Learning rate for the optimizer.

  • lr_patience (int | None) – Number of epochs with no improvement on the validation loss to wait before reducing the learning rate.

  • factor (float, default=0.1) – Factor by which the learning rate will be reduced.

  • weight_decay (float | None) – Weight decay (L2 penalty) coefficient.

  • train_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during training.

  • val_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during validation.

  • dataloader_kwargs (dict, default={}) – The kwargs for the pytorch dataloader class.

Returns:

self – The built regressor.

Return type:

object

encode(X, embeddings=None, batch_size=64)

Encodes input data using the trained model’s embedding layer.

Parameters:
  • X (array-like or DataFrame) – Input data to be encoded.

  • batch_size (int, optional, default=64) – Batch size for encoding.

Returns:

Encoded representations of the input data.

Return type:

torch.Tensor

Raises:

ValueError – If the model or data module is not fitted.

evaluate(X, y_true, embeddings=None, metrics=None)

Evaluate the model on the given data using specified metrics.

Parameters:
  • X (array-like or pd.DataFrame of shape (n_samples, n_features)) – The input samples to predict.

  • y_true (array-like of shape (n_samples,) or (n_samples, n_outputs)) – The true target values against which to evaluate the predictions.

  • metrics (dict) – A dictionary where keys are metric names and values are the metric functions.

Notes

This method uses the predict method to generate predictions and computes each metric.

Returns:

scores – A dictionary with metric names as keys and their corresponding scores as values.

Return type:

dict

fit(X, y, val_size=0.2, X_val=None, y_val=None, embeddings=None, embeddings_val=None, max_epochs=100, random_state=101, batch_size=128, shuffle=True, patience=15, monitor='val_loss', mode='min', lr=None, lr_patience=None, lr_factor=None, weight_decay=None, checkpoint_path='model_checkpoints', dataloader_kwargs={}, train_metrics=None, val_metrics=None, rebuild=True, **trainer_kwargs)

Trains the regression model using the provided training data. Optionally, a separate validation set can be used.

Parameters:
  • X (DataFrame or array-like, shape (n_samples, n_features)) – The training input samples.

  • y (array-like, shape (n_samples,) or (n_samples, n_targets)) – The target values (real numbers).

  • val_size (float) – The proportion of the dataset to include in the validation split if X_val is None. Ignored if X_val is provided.

  • X_val (DataFrame or array-like, shape (n_samples, n_features), optional) – The validation input samples. If provided, X and y are not split and this data is used for validation.

  • y_val (array-like, shape (n_samples,) or (n_samples, n_targets), optional) – The validation target values. Required if X_val is provided.

  • max_epochs (int) – Maximum number of epochs for training.

  • random_state (int) – Controls the shuffling applied to the data before applying the split.

  • batch_size (int) – Number of samples per gradient update.

  • shuffle (bool) – Whether to shuffle the training data before each epoch.

  • patience (int) – Number of epochs with no improvement on the validation loss to wait before early stopping.

  • monitor (str) – The metric to monitor for early stopping.

  • mode (str) – Whether the monitored metric should be minimized (min) or maximized (max).

  • lr (float | None) – Learning rate for the optimizer.

  • lr_patience (int | None) – Number of epochs with no improvement on the validation loss to wait before reducing the learning rate.

  • factor (float, default=0.1) – Factor by which the learning rate will be reduced.

  • weight_decay (float | None) – Weight decay (L2 penalty) coefficient.

  • checkpoint_path (str, default="model_checkpoints") – Path where the checkpoints are being saved.

  • dataloader_kwargs (dict, default={}) – The kwargs for the pytorch dataloader class.

  • train_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during training.

  • val_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during validation.

  • rebuild (bool, default=True) – Whether to rebuild the model when it already was built.

  • **trainer_kwargs (Additional keyword arguments for PyTorch Lightning's Trainer class.)

Returns:

self – The fitted regressor.

Return type:

object

get_number_of_params(requires_grad=True)

Calculate the number of parameters in the model.

Parameters:

requires_grad (bool, optional) – If True, only count the parameters that require gradients (trainable parameters). If False, count all parameters. Default is True.

Returns:

The total number of parameters in the model.

Return type:

int

Raises:

ValueError – If the model has not been built prior to calling this method.

get_params(deep=True)

Get parameters for this estimator.

classmethod load(path)

Load and return a fitted model from path.

Parameters:

path (str) – Path to a file previously written by save().

Returns:

A fully reconstructed, ready-to-predict estimator of the same type that was saved.

Return type:

estimator

optimize_hparams(X, y, X_val=None, y_val=None, embeddings=None, embeddings_val=None, time=100, max_epochs=200, prune_by_epoch=True, prune_epoch=5, fixed_params={'cat_encoding': 'int', 'head_layer_size_length': 0, 'head_skip_layer': False, 'head_skip_layers': False, 'pooling_method': 'avg', 'use_cls': False}, custom_search_space=None, **optimize_kwargs)

Optimizes hyperparameters using Bayesian optimization with optional pruning.

Parameters:
  • X (array-like) – Training data.

  • y (array-like) – Training labels.

  • X_val (array-like, optional) – Validation data and labels.

  • y_val (array-like, optional) – Validation data and labels.

  • time (int) – The number of optimization trials to run.

  • max_epochs (int) – Maximum number of epochs for training.

  • prune_by_epoch (bool) – Whether to prune based on a specific epoch (True) or the best validation loss (False).

  • prune_epoch (int) – The specific epoch to prune by when prune_by_epoch is True.

  • **optimize_kwargs (dict) – Additional keyword arguments passed to the fit method.

Returns:

best_hparams – Best hyperparameters found during optimization.

Return type:

list

predict(X, embeddings=None, device=None)

Predicts target values for the given input samples.

Parameters:

X (DataFrame or array-like, shape (n_samples, n_features)) – The input samples for which to predict target values.

Returns:

predictions – The predicted target values.

Return type:

ndarray, shape (n_samples,) or (n_samples, n_outputs)

pretrain(pretrain_epochs=15, k_neighbors=10, temperature=0.1, save_path='pretrained_embeddings.pth', lr=0.001, use_positive=True, use_negative=False, pool_sequence=True)

Pretrains the embedding layer of the model using a contrastive learning approach.

This method performs pretraining by optimizing the embeddings with respect to neighborhood structure in the feature space. The embeddings are saved after training.

Parameters:
  • pretrain_epochs (int, default=15) – Number of epochs to run pretraining.

  • k_neighbors (int, default=10) – Number of neighbors used in the contrastive loss computation.

  • temperature (float, default=0.1) – Temperature parameter for contrastive loss scaling.

  • save_path (str, default="pretrained_embeddings.pth") – Path to save the pretrained embeddings.

  • lr (float, default=1e-3) – Learning rate for the pretraining optimizer.

  • use_positive (bool, default=True) – Whether to include positive pairs in contrastive learning.

  • use_negative (bool, default=False) – Whether to include negative pairs in contrastive learning.

  • pool_sequence (bool, default=True) – Whether to apply sequence pooling before computing contrastive loss.

Raises:
  • ValueError – If the model has not been built before calling this method.

  • ValueError – If the model does not contain an embedding layer.

Notes

  • This function requires that self.build_model() has been called beforehand.

  • The pretraining method uses self.task_model.estimator.embedding_layer.

  • The method invokes super()._pretrain() with regression mode enabled.

save(path)

Save the fitted model to path.

The bundle written by this method can be restored with load(). It contains all state required for inference: the config, the fitted preprocessor, feature metadata, and the neural-network weights.

Parameters:

path (str) – Destination file path (e.g. "model.pt").

Raises:

ValueError – If the model has not been fitted yet.

Return type:

None

score(X, y, embeddings=None, metric=sklearn.metrics.mean_squared_error)

Calculate the score of the model using the specified metric.

Parameters:
  • X (array-like or pd.DataFrame of shape (n_samples, n_features)) – The input samples to predict.

  • y (array-like of shape (n_samples,) or (n_samples, n_outputs)) – The true target values against which to evaluate the predictions.

  • metric (callable, default=mean_squared_error) – The metric function to use for evaluation. Must be a callable with the signature metric(y_true, y_pred).

Returns:

score – The score calculated using the specified metric.

Return type:

float

set_params(**parameters)

Set the parameters of this estimator.

class deeptab.models.NDTFLSS(*args: Any, **kwargs: Any)[source]

Neural Decision Forest for distributional regression. This class extends the SklearnBaseLSS class and uses the NDTF model with the default NDTF configuration.

Notes

The parameters for this class include the attributes from the config dataclass as well as preprocessing arguments handled by the base class.

Configuration class for the default Neural Decision Tree Forest (NDTF) model with predefined hyperparameters.

Parameters:
  • min_depth (int, default=2) – Minimum depth of trees in the forest. Controls the simplest model structure.

  • max_depth (int, default=10) – Maximum depth of trees in the forest. Controls the maximum complexity of the trees.

  • temperature (float, default=0.1) – Temperature parameter for softening the node decisions during path probability calculation.

  • node_sampling (float, default=0.3) – Fraction of nodes sampled for regularization penalty calculation. Reduces computation by focusing on a subset of nodes.

  • lamda (float, default=0.3) – Regularization parameter to control the complexity of the paths, penalizing overconfident or imbalanced paths.

  • n_ensembles (int, default=12) – Number of trees in the forest

  • penalty_factor (float, default=0.01) – Factor with which the penalty is multiplied

  • feature_preprocessing (dict, optional) – Dictionary mapping feature names to specific preprocessing methods. Overrides global defaults.

  • n_bins (int, default=64) – Number of bins used for binning-based preprocessing (e.g., for discretizers or PLE).

  • numerical_preprocessing (str, default="ple") – Preprocessing method for numerical features (e.g., “standardization”, “minmax”, “ple”, “rbf”, etc.).

  • categorical_preprocessing (str, default="int") – Preprocessing method for categorical features (e.g., “int”, “ordinal”, “onehot”).

  • use_decision_tree_bins (bool, default=False) – Whether to use decision tree binning for numerical discretization.

  • binning_strategy (str, default="uniform") – Strategy for bin placement when not using tree-based methods. Options: “uniform”, “quantile”.

  • task (str, default="regression") – Problem type used to guide preprocessing (e.g., “regression” or “classification”).

  • cat_cutoff (float or int, default=0.03) – Threshold to determine whether integer-valued features are treated as categorical.

  • treat_all_integers_as_numerical (bool, default=False) – If True, treat all integer-typed columns as numerical regardless of cardinality.

  • degree (int, default=3) – Degree of polynomial or spline basis functions where applicable.

  • scaling_strategy (str, default="minmax") – Strategy for feature scaling (e.g., “standardization”, “minmax”, etc.).

  • n_knots (int, default=64) – Number of knots used in spline-based feature expansions.

  • use_decision_tree_knots (bool, default=True) – Whether to use decision tree-based knot placement for spline transformations.

  • knots_strategy (str, default="uniform") – Strategy for placing knots for splines (“uniform” or “quantile”).

  • spline_implementation (str, default="sklearn") – Which spline backend implementation to use (e.g., “sklearn”, “custom”).

  • min_unique_vals (int, default=5) – Minimum number of unique values required for a feature to be treated as numerical.

Examples

>>> from deeptab.models import NDTFLSS
>>> model = NDTFLSS(n_ensembles=12, max_depth=8)
>>> model.fit(X_train, y_train, family='normal')
>>> preds = model.predict(X_test)
>>> model.evaluate(X_test, y_test)
build_model(X, y, val_size=0.2, X_val=None, y_val=None, random_state=101, batch_size=128, shuffle=True, lr=None, lr_patience=None, lr_factor=None, weight_decay=None, train_metrics=None, val_metrics=None, dataloader_kwargs={})

Builds the model using the provided training data.

Parameters:
  • X (DataFrame or array-like, shape (n_samples, n_features)) – The training input samples.

  • y (array-like, shape (n_samples,) or (n_samples, n_targets)) – The target values (real numbers).

  • val_size (float) – The proportion of the dataset to include in the validation split if X_val is None. Ignored if X_val is provided.

  • X_val (DataFrame or array-like, shape (n_samples, n_features), optional) – The validation input samples. If provided, X and y are not split and this data is used for validation.

  • y_val (array-like, shape (n_samples,) or (n_samples, n_targets), optional) – The validation target values. Required if X_val is provided.

  • random_state (int) – Controls the shuffling applied to the data before applying the split.

  • batch_size (int) – Number of samples per gradient update.

  • shuffle (bool) – Whether to shuffle the training data before each epoch.

  • lr (float | None) – Learning rate for the optimizer.

  • lr_patience (int | None) – Number of epochs with no improvement on the validation loss to wait before reducing the learning rate.

  • lr_factor (float | None) – Factor by which the learning rate will be reduced.

  • train_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during training.

  • val_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during validation.

  • weight_decay (float | None) – Weight decay (L2 penalty) coefficient.

  • dataloader_kwargs (dict, default={}) – The kwargs for the pytorch dataloader class.

Returns:

self – The built distributional regressor.

Return type:

object

encode(X, batch_size=64)

Encodes input data using the trained model’s embedding layer.

Parameters:
  • X (array-like or DataFrame) – Input data to be encoded.

  • batch_size (int, optional, default=64) – Batch size for encoding.

Returns:

Encoded representations of the input data.

Return type:

torch.Tensor

Raises:

ValueError – If the model or data module is not fitted.

evaluate(X, y_true, metrics=None, distribution_family=None)

Evaluate the model on the given data using specified metrics.

Parameters:
  • X (array-like or pd.DataFrame of shape (n_samples, n_features)) – The input samples to predict.

  • y_true (array-like of shape (n_samples,)) – The true class labels against which to evaluate the predictions.

  • metrics (dict) – A dictionary where keys are metric names and values are tuples containing the metric function and a boolean indicating whether the metric requires probability scores (True) or class labels (False).

  • distribution_family (str, optional) – Specifies the distribution family the model is predicting for. If None, it will attempt to infer based on the model’s settings.

Returns:

scores – A dictionary with metric names as keys and their corresponding scores as values.

Return type:

dict

Notes

This method uses either the predict or predict_proba method depending on the metric requirements.

fit(X, y, family, val_size=0.2, X_val=None, y_val=None, max_epochs=100, random_state=101, batch_size=128, shuffle=True, patience=15, monitor='val_loss', mode='min', lr=None, lr_patience=None, lr_factor=None, weight_decay=None, checkpoint_path='model_checkpoints', distributional_kwargs=None, train_metrics=None, val_metrics=None, dataloader_kwargs={}, rebuild=True, **trainer_kwargs)

Trains the regression model using the provided training data. Optionally, a separate validation set can be used.

Parameters:
  • X (DataFrame or array-like, shape (n_samples, n_features)) – The training input samples.

  • y (array-like, shape (n_samples,) or (n_samples, n_targets)) – The target values (real numbers).

  • family (str) – The name of the distribution family to use for the loss function. Examples include ‘normal’ for regression tasks.

  • val_size (float) – The proportion of the dataset to include in the validation split if X_val is None. Ignored if X_val is provided.

  • X_val (DataFrame or array-like, shape (n_samples, n_features), optional) – The validation input samples. If provided, X and y are not split and this data is used for validation.

  • y_val (array-like, shape (n_samples,) or (n_samples, n_targets), optional) – The validation target values. Required if X_val is provided.

  • max_epochs (int) – Maximum number of epochs for training.

  • random_state (int) – Controls the shuffling applied to the data before applying the split.

  • batch_size (int) – Number of samples per gradient update.

  • shuffle (bool) – Whether to shuffle the training data before each epoch.

  • patience (int) – Number of epochs with no improvement on the validation loss to wait before early stopping.

  • monitor (str) – The metric to monitor for early stopping.

  • mode (str) – Whether the monitored metric should be minimized (min) or maximized (max).

  • lr (float | None) – Learning rate for the optimizer.

  • lr_patience (int | None) – Number of epochs with no improvement on the validation loss to wait before reducing the learning rate.

  • factor (float, default=0.1) – Factor by which the learning rate will be reduced.

  • weight_decay (float | None) – Weight decay (L2 penalty) coefficient.

  • distributional_kwargs (dict, default=None) – any arguments taht are specific for a certain distribution.

  • train_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during training.

  • val_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during validation.

  • checkpoint_path (str, default="model_checkpoints") – Path where the checkpoints are being saved.

  • dataloader_kwargs (dict, default={}) – The kwargs for the pytorch dataloader class.

  • **trainer_kwargs (Additional keyword arguments for PyTorch Lightning's Trainer class.)

Returns:

self – The fitted regressor.

Return type:

object

get_default_metrics(distribution_family)

Provides default metrics based on the distribution family.

Parameters:

distribution_family (str) – The distribution family for which to provide default metrics.

Returns:

metrics – A dictionary of default metric functions.

Return type:

dict

get_number_of_params(requires_grad=True)

Calculate the number of parameters in the model.

Parameters:

requires_grad (bool, optional) – If True, only count the parameters that require gradients (trainable parameters). If False, count all parameters. Default is True.

Returns:

The total number of parameters in the model.

Return type:

int

Raises:

ValueError – If the model has not been built prior to calling this method.

get_params(deep=True)

Get parameters for this estimator.

Parameters:

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

params – Parameter names mapped to their values.

Return type:

dict

classmethod load(path)

Load and return a fitted model from path.

Parameters:

path (str) – Path to a file previously written by save().

Returns:

A fully reconstructed, ready-to-predict estimator.

Return type:

estimator

optimize_hparams(X, y, X_val=None, y_val=None, time=100, max_epochs=200, prune_by_epoch=True, prune_epoch=5, fixed_params={'cat_encoding': 'int', 'head_layer_size_length': 0, 'head_skip_layer': False, 'head_skip_layers': False, 'pooling_method': 'avg', 'use_cls': False}, custom_search_space=None, **optimize_kwargs)

Optimizes hyperparameters using Bayesian optimization with optional pruning.

Parameters:
  • X (array-like) – Training data.

  • y (array-like) – Training labels.

  • X_val (array-like, optional) – Validation data and labels.

  • y_val (array-like, optional) – Validation data and labels.

  • time (int) – The number of optimization trials to run.

  • max_epochs (int) – Maximum number of epochs for training.

  • prune_by_epoch (bool) – Whether to prune based on a specific epoch (True) or the best validation loss (False).

  • prune_epoch (int) – The specific epoch to prune by when prune_by_epoch is True.

  • **optimize_kwargs (dict) – Additional keyword arguments passed to the fit method.

Returns:

best_hparams – Best hyperparameters found during optimization.

Return type:

list

predict(X, raw=False, device=None)

Predicts target values for the given input samples.

Parameters:

X (DataFrame or array-like, shape (n_samples, n_features)) – The input samples for which to predict target values.

Returns:

predictions – The predicted target values.

Return type:

ndarray, shape (n_samples,) or (n_samples, n_outputs)

save(path)

Save the fitted model to path.

Parameters:

path (str) – Destination file path (e.g. "model.pt").

Raises:

ValueError – If the model has not been fitted yet.

Return type:

None

score(X, y, metric='NLL')

Calculate the score of the model using the specified metric.

Parameters:
  • X (array-like or pd.DataFrame of shape (n_samples, n_features)) – The input samples to predict.

  • y (array-like of shape (n_samples,) or (n_samples, n_outputs)) – The true target values against which to evaluate the predictions.

  • metric (str, default="NLL") – So far, only negative log-likelihood is supported

Returns:

score – The score calculated using the specified metric.

Return type:

float

set_params(**parameters)

Set the parameters of this estimator.

Parameters:

**parameters (dict) – Estimator parameters.

Returns:

self – Estimator instance.

Return type:

object

class deeptab.models.SAINTClassifier(*args: Any, **kwargs: Any)[source]
SAINT Classifier. This class extends the SklearnBaseClassifier class

and uses the SAINT model with the default SAINT configuration.

Notes

The parameters for this class include the attributes from the config dataclass as well as preprocessing arguments handled by the base class.

Configuration class for the SAINT model with predefined hyperparameters.

Parameters:
  • n_layers (int, default=4) – Number of transformer layers.

  • n_heads (int, default=8) – Number of attention heads in the transformer.

  • d_model (int, default=128) – Dimensionality of embeddings or model representations.

  • attn_dropout (float, default=0.2) – Dropout rate for the attention mechanism.

  • ff_dropout (float, default=0.1) – Dropout rate for the feed-forward layers.

  • norm (str, default="LayerNorm") – Type of normalization to be used (‘LayerNorm’, ‘RMSNorm’, etc.).

  • activation (callable, default=nn.SELU()) – Activation function for the transformer layers.

  • transformer_activation (callable, default=ReGLU()) – Activation function for the transformer feed-forward layers.

  • transformer_dim_feedforward (int, default=256) – Dimensionality of the feed-forward layers in the transformer.

  • norm_first (bool, default=False) – Whether to apply normalization before other operations in each transformer block.

  • bias (bool, default=True) – Whether to use bias in linear layers.

  • head_layer_sizes (list, default=()) – Sizes of the fully connected layers in the model’s head.

  • head_dropout (float, default=0.5) – Dropout rate for the head layers.

  • head_skip_layers (bool, default=False) – Whether to use skip connections in the head layers.

  • head_activation (callable, default=nn.SELU()) – Activation function for the head layers.

  • head_use_batch_norm (bool, default=False) – Whether to use batch normalization in the head layers.

  • pooling_method (str, default="avg") – Pooling method to be used (‘cls’, ‘avg’, etc.).

  • use_cls (bool, default=False) – Whether to use a CLS token for pooling.

  • cat_encoding (str, default="int") – Method for encoding categorical features (‘int’, ‘one-hot’, or ‘linear’).

  • feature_preprocessing (dict, optional) – Dictionary mapping feature names to specific preprocessing methods. Overrides global defaults.

  • n_bins (int, default=64) – Number of bins used for binning-based preprocessing (e.g., for discretizers or PLE).

  • numerical_preprocessing (str, default="ple") – Preprocessing method for numerical features (e.g., “standardization”, “minmax”, “ple”, “rbf”, etc.).

  • categorical_preprocessing (str, default="int") – Preprocessing method for categorical features (e.g., “int”, “ordinal”, “onehot”).

  • use_decision_tree_bins (bool, default=False) – Whether to use decision tree binning for numerical discretization.

  • binning_strategy (str, default="uniform") – Strategy for bin placement when not using tree-based methods. Options: “uniform”, “quantile”.

  • task (str, default="regression") – Problem type used to guide preprocessing (e.g., “regression” or “classification”).

  • cat_cutoff (float or int, default=0.03) – Threshold to determine whether integer-valued features are treated as categorical.

  • treat_all_integers_as_numerical (bool, default=False) – If True, treat all integer-typed columns as numerical regardless of cardinality.

  • degree (int, default=3) – Degree of polynomial or spline basis functions where applicable.

  • scaling_strategy (str, default="minmax") – Strategy for feature scaling (e.g., “standardization”, “minmax”, etc.).

  • n_knots (int, default=64) – Number of knots used in spline-based feature expansions.

  • use_decision_tree_knots (bool, default=True) – Whether to use decision tree-based knot placement for spline transformations.

  • knots_strategy (str, default="uniform") – Strategy for placing knots for splines (“uniform” or “quantile”).

  • spline_implementation (str, default="sklearn") – Which spline backend implementation to use (e.g., “sklearn”, “custom”).

  • min_unique_vals (int, default=5) – Minimum number of unique values required for a feature to be treated as numerical.

Examples

>>> from deeptab.models import SAINTClassifier
>>> model = SAINTClassifier(d_model=64, n_layers=8)
>>> model.fit(X_train, y_train)
>>> preds = model.predict(X_test)
>>> model.evaluate(X_test, y_test)
build_model(X, y, val_size=0.2, X_val=None, y_val=None, embeddings=None, embeddings_val=None, random_state=101, batch_size=128, shuffle=True, lr=None, lr_patience=None, lr_factor=None, weight_decay=None, train_metrics=None, val_metrics=None, dataloader_kwargs={})

Builds the model using the provided training data.

Parameters:
  • X (DataFrame or array-like, shape (n_samples, n_features)) – The training input samples.

  • y (array-like, shape (n_samples,) or (n_samples, n_targets)) – The target values (real numbers).

  • val_size (float) – The proportion of the dataset to include in the validation split if X_val is None. Ignored if X_val is provided.

  • X_val (DataFrame or array-like, shape (n_samples, n_features), optional) – The validation input samples. If provided, X and y are not split and this data is used for validation.

  • y_val (array-like, shape (n_samples,) or (n_samples, n_targets), optional) – The validation target values. Required if X_val is provided.

  • random_state (int) – Controls the shuffling applied to the data before applying the split.

  • batch_size (int) – Number of samples per gradient update.

  • shuffle (bool) – Whether to shuffle the training data before each epoch.

  • lr (float | None) – Learning rate for the optimizer.

  • lr_patience (int | None) – Number of epochs with no improvement on the validation loss to wait before reducing the learning rate.

  • lr_factor (float | None) – Factor by which the learning rate will be reduced.

  • train_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during training.

  • val_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during validation.

  • weight_decay (float | None) – Weight decay (L2 penalty) coefficient.

  • dataloader_kwargs (dict, default={}) – The kwargs for the pytorch dataloader class.

Returns:

self – The built classifier.

Return type:

object

encode(X, embeddings=None, batch_size=64)

Encodes input data using the trained model’s embedding layer.

Parameters:
  • X (array-like or DataFrame) – Input data to be encoded.

  • batch_size (int, optional, default=64) – Batch size for encoding.

Returns:

Encoded representations of the input data.

Return type:

torch.Tensor

Raises:

ValueError – If the model or data module is not fitted.

evaluate(X, y_true, embeddings=None, metrics=None)

Evaluate the model on the given data using specified metrics.

Parameters:
  • X (array-like or pd.DataFrame of shape (n_samples, n_features)) – The input samples to predict.

  • y_true (array-like of shape (n_samples,)) – The true class labels against which to evaluate the predictions.

  • embneddings (array-like or list of shape(n_samples, dimension)) – List or array with embeddings for unstructured data inputs

  • metrics (dict) – A dictionary where keys are metric names and values are tuples containing the metric function and a boolean indicating whether the metric requires probability scores (True) or class labels (False).

Returns:

scores – A dictionary with metric names as keys and their corresponding scores as values.

Return type:

dict

Notes

This method uses either the predict or predict_proba method depending on the metric requirements.

fit(X, y, val_size=0.2, X_val=None, y_val=None, embeddings=None, embeddings_val=None, max_epochs=100, random_state=101, batch_size=128, shuffle=True, patience=15, monitor='val_loss', mode='min', lr=None, lr_patience=None, lr_factor=None, weight_decay=None, checkpoint_path='model_checkpoints', train_metrics=None, val_metrics=None, dataloader_kwargs={}, rebuild=True, **trainer_kwargs)

Trains the classification model using the provided training data. Optionally, a separate validation set can be used.

Parameters:
  • X (DataFrame or array-like, shape (n_samples, n_features)) – The training input samples.

  • y (array-like, shape (n_samples,) or (n_samples, n_targets)) – The target values (real numbers).

  • val_size (float) – The proportion of the dataset to include in the validation split if X_val is None. Ignored if X_val is provided.

  • X_val (DataFrame or array-like, shape (n_samples, n_features), optional) – The validation input samples. If provided, X and y are not split and this data is used for validation.

  • y_val (array-like, shape (n_samples,) or (n_samples, n_targets), optional) – The validation target values. Required if X_val is provided.

  • max_epochs (int) – Maximum number of epochs for training.

  • random_state (int) – Controls the shuffling applied to the data before applying the split.

  • batch_size (int) – Number of samples per gradient update.

  • shuffle (bool) – Whether to shuffle the training data before each epoch.

  • patience (int) – Number of epochs with no improvement on the validation loss to wait before early stopping.

  • monitor (str) – The metric to monitor for early stopping.

  • mode (str) – Whether the monitored metric should be minimized (min) or maximized (max).

  • lr (float | None) – Learning rate for the optimizer.

  • lr_patience (int | None) – Number of epochs with no improvement on the validation loss to wait before reducing the learning rate.

  • factor (float, default=0.1) – Factor by which the learning rate will be reduced.

  • weight_decay (float | None) – Weight decay (L2 penalty) coefficient.

  • checkpoint_path (str, default="model_checkpoints") – Path where the checkpoints are being saved.

  • train_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during training.

  • val_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during validation.

  • dataloader_kwargs (dict, default={}) – The kwargs for the pytorch dataloader class.

  • rebuild (bool, default=True) – Whether to rebuild the model when it already was built.

  • **trainer_kwargs (Additional keyword arguments for PyTorch Lightning's Trainer class.)

Returns:

self – The fitted classifier.

Return type:

object

get_number_of_params(requires_grad=True)

Calculate the number of parameters in the model.

Parameters:

requires_grad (bool, optional) – If True, only count the parameters that require gradients (trainable parameters). If False, count all parameters. Default is True.

Returns:

The total number of parameters in the model.

Return type:

int

Raises:

ValueError – If the model has not been built prior to calling this method.

get_params(deep=True)

Get parameters for this estimator.

classmethod load(path)

Load and return a fitted model from path.

Parameters:

path (str) – Path to a file previously written by save().

Returns:

A fully reconstructed, ready-to-predict estimator of the same type that was saved.

Return type:

estimator

optimize_hparams(X, y, X_val=None, y_val=None, embeddings=None, embeddings_val=None, time=100, max_epochs=200, prune_by_epoch=True, prune_epoch=5, fixed_params={'cat_encoding': 'int', 'head_layer_size_length': 0, 'head_skip_layer': False, 'head_skip_layers': False, 'pooling_method': 'avg', 'use_cls': False}, custom_search_space=None, **optimize_kwargs)

Optimizes hyperparameters using Bayesian optimization with optional pruning.

Parameters:
  • X (array-like) – Training data.

  • y (array-like) – Training labels.

  • X_val (array-like, optional) – Validation data and labels.

  • y_val (array-like, optional) – Validation data and labels.

  • time (int) – The number of optimization trials to run.

  • max_epochs (int) – Maximum number of epochs for training.

  • prune_by_epoch (bool) – Whether to prune based on a specific epoch (True) or the best validation loss (False).

  • prune_epoch (int) – The specific epoch to prune by when prune_by_epoch is True.

  • **optimize_kwargs (dict) – Additional keyword arguments passed to the fit method.

Returns:

best_hparams – Best hyperparameters found during optimization.

Return type:

list

predict(X, embeddings=None, device=None)

Predicts target labels for the given input samples.

Parameters:

X (DataFrame or array-like, shape (n_samples, n_features)) – The input samples for which to predict target values.

Returns:

predictions – The predicted class labels.

Return type:

ndarray, shape (n_samples,)

predict_proba(X, embeddings=None, device=None)

Predicts class probabilities for the given input samples.

Parameters:

X (DataFrame or array-like, shape (n_samples, n_features)) – The input samples for which to predict class probabilities.

Returns:

probabilities – The predicted class probabilities.

Return type:

ndarray, shape (n_samples, n_classes)

pretrain(pretrain_epochs=15, k_neighbors=10, temperature=0.1, save_path='pretrained_embeddings.pth', lr=0.001, use_positive=True, use_negative=False, pool_sequence=True)

Pretrains the embedding layer of the model using a contrastive learning approach.

This method performs pretraining by optimizing the embeddings with respect to neighborhood structure in the feature space. The embeddings are saved after training.

Parameters:
  • pretrain_epochs (int, default=15) – Number of epochs to run pretraining.

  • k_neighbors (int, default=10) – Number of neighbors used in the contrastive loss computation.

  • temperature (float, default=0.1) – Temperature parameter for contrastive loss scaling.

  • save_path (str, default="pretrained_embeddings.pth") – Path to save the pretrained embeddings.

  • lr (float, default=1e-3) – Learning rate for the pretraining optimizer.

  • use_positive (bool, default=True) – Whether to include positive pairs in contrastive learning.

  • use_negative (bool, default=False) – Whether to include negative pairs in contrastive learning.

  • pool_sequence (bool, default=True) – Whether to apply sequence pooling before computing contrastive loss.

Raises:
  • ValueError – If the model has not been built before calling this method.

  • ValueError – If the model does not contain an embedding layer.

Notes

  • This function requires that self.build_model() has been called beforehand.

  • The pretraining method uses self.task_model.estimator.embedding_layer.

  • The method invokes super()._pretrain() with regression mode enabled.

save(path)

Save the fitted model to path.

The bundle written by this method can be restored with load(). It contains all state required for inference: the config, the fitted preprocessor, feature metadata, and the neural-network weights.

Parameters:

path (str) – Destination file path (e.g. "model.pt").

Raises:

ValueError – If the model has not been fitted yet.

Return type:

None

score(X, y, embeddings=None, metric=(sklearn.metrics.log_loss, True))

Calculate the score of the model using the specified metric.

Parameters:
  • X (array-like or pd.DataFrame of shape (n_samples, n_features)) – The input samples to predict.

  • y (array-like of shape (n_samples,)) – The true class labels against which to evaluate the predictions.

  • metric (tuple, default=(log_loss, True)) – A tuple containing the metric function and a boolean indicating whether the metric requires probability scores (True) or class labels (False).

Returns:

score – The score calculated using the specified metric.

Return type:

float

set_params(**parameters)

Set the parameters of this estimator.

class deeptab.models.SAINTRegressor(*args: Any, **kwargs: Any)[source]

SAINT regressor. This class extends the SklearnBaseRegressor class and uses the SAINT model with the default SAINT configuration.

Notes

The parameters for this class include the attributes from the config dataclass as well as preprocessing arguments handled by the base class.

Configuration class for the SAINT model with predefined hyperparameters.

Parameters:
  • n_layers (int, default=4) – Number of transformer layers.

  • n_heads (int, default=8) – Number of attention heads in the transformer.

  • d_model (int, default=128) – Dimensionality of embeddings or model representations.

  • attn_dropout (float, default=0.2) – Dropout rate for the attention mechanism.

  • ff_dropout (float, default=0.1) – Dropout rate for the feed-forward layers.

  • norm (str, default="LayerNorm") – Type of normalization to be used (‘LayerNorm’, ‘RMSNorm’, etc.).

  • activation (callable, default=nn.SELU()) – Activation function for the transformer layers.

  • transformer_activation (callable, default=ReGLU()) – Activation function for the transformer feed-forward layers.

  • transformer_dim_feedforward (int, default=256) – Dimensionality of the feed-forward layers in the transformer.

  • norm_first (bool, default=False) – Whether to apply normalization before other operations in each transformer block.

  • bias (bool, default=True) – Whether to use bias in linear layers.

  • head_layer_sizes (list, default=()) – Sizes of the fully connected layers in the model’s head.

  • head_dropout (float, default=0.5) – Dropout rate for the head layers.

  • head_skip_layers (bool, default=False) – Whether to use skip connections in the head layers.

  • head_activation (callable, default=nn.SELU()) – Activation function for the head layers.

  • head_use_batch_norm (bool, default=False) – Whether to use batch normalization in the head layers.

  • pooling_method (str, default="avg") – Pooling method to be used (‘cls’, ‘avg’, etc.).

  • use_cls (bool, default=False) – Whether to use a CLS token for pooling.

  • cat_encoding (str, default="int") – Method for encoding categorical features (‘int’, ‘one-hot’, or ‘linear’).

  • feature_preprocessing (dict, optional) – Dictionary mapping feature names to specific preprocessing methods. Overrides global defaults.

  • n_bins (int, default=64) – Number of bins used for binning-based preprocessing (e.g., for discretizers or PLE).

  • numerical_preprocessing (str, default="ple") – Preprocessing method for numerical features (e.g., “standardization”, “minmax”, “ple”, “rbf”, etc.).

  • categorical_preprocessing (str, default="int") – Preprocessing method for categorical features (e.g., “int”, “ordinal”, “onehot”).

  • use_decision_tree_bins (bool, default=False) – Whether to use decision tree binning for numerical discretization.

  • binning_strategy (str, default="uniform") – Strategy for bin placement when not using tree-based methods. Options: “uniform”, “quantile”.

  • task (str, default="regression") – Problem type used to guide preprocessing (e.g., “regression” or “classification”).

  • cat_cutoff (float or int, default=0.03) – Threshold to determine whether integer-valued features are treated as categorical.

  • treat_all_integers_as_numerical (bool, default=False) – If True, treat all integer-typed columns as numerical regardless of cardinality.

  • degree (int, default=3) – Degree of polynomial or spline basis functions where applicable.

  • scaling_strategy (str, default="minmax") – Strategy for feature scaling (e.g., “standardization”, “minmax”, etc.).

  • n_knots (int, default=64) – Number of knots used in spline-based feature expansions.

  • use_decision_tree_knots (bool, default=True) – Whether to use decision tree-based knot placement for spline transformations.

  • knots_strategy (str, default="uniform") – Strategy for placing knots for splines (“uniform” or “quantile”).

  • spline_implementation (str, default="sklearn") – Which spline backend implementation to use (e.g., “sklearn”, “custom”).

  • min_unique_vals (int, default=5) – Minimum number of unique values required for a feature to be treated as numerical.

Examples

>>> from deeptab.models import SAINTRegressor
>>> model = SAINTRegressor(d_model=64, n_layers=8)
>>> model.fit(X_train, y_train)
>>> preds = model.predict(X_test)
>>> model.evaluate(X_test, y_test)
build_model(X, y, val_size=0.2, X_val=None, y_val=None, embeddings=None, embeddings_val=None, random_state=101, batch_size=128, shuffle=True, lr=None, lr_patience=None, lr_factor=None, weight_decay=None, train_metrics=None, val_metrics=None, dataloader_kwargs={})

Builds the model using the provided training data.

Parameters:
  • X (DataFrame or array-like, shape (n_samples, n_features)) – The training input samples.

  • y (array-like, shape (n_samples,) or (n_samples, n_targets)) – The target values (real numbers).

  • val_size (float) – The proportion of the dataset to include in the validation split if X_val is None. Ignored if X_val is provided.

  • X_val (DataFrame or array-like, shape (n_samples, n_features), optional) – The validation input samples. If provided, X and y are not split and this data is used for validation.

  • y_val (array-like, shape (n_samples,) or (n_samples, n_targets), optional) – The validation target values. Required if X_val is provided.

  • random_state (int) – Controls the shuffling applied to the data before applying the split.

  • batch_size (int) – Number of samples per gradient update.

  • shuffle (bool) – Whether to shuffle the training data before each epoch.

  • lr (float | None) – Learning rate for the optimizer.

  • lr_patience (int | None) – Number of epochs with no improvement on the validation loss to wait before reducing the learning rate.

  • factor (float, default=0.1) – Factor by which the learning rate will be reduced.

  • weight_decay (float | None) – Weight decay (L2 penalty) coefficient.

  • train_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during training.

  • val_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during validation.

  • dataloader_kwargs (dict, default={}) – The kwargs for the pytorch dataloader class.

Returns:

self – The built regressor.

Return type:

object

encode(X, embeddings=None, batch_size=64)

Encodes input data using the trained model’s embedding layer.

Parameters:
  • X (array-like or DataFrame) – Input data to be encoded.

  • batch_size (int, optional, default=64) – Batch size for encoding.

Returns:

Encoded representations of the input data.

Return type:

torch.Tensor

Raises:

ValueError – If the model or data module is not fitted.

evaluate(X, y_true, embeddings=None, metrics=None)

Evaluate the model on the given data using specified metrics.

Parameters:
  • X (array-like or pd.DataFrame of shape (n_samples, n_features)) – The input samples to predict.

  • y_true (array-like of shape (n_samples,) or (n_samples, n_outputs)) – The true target values against which to evaluate the predictions.

  • metrics (dict) – A dictionary where keys are metric names and values are the metric functions.

Notes

This method uses the predict method to generate predictions and computes each metric.

Returns:

scores – A dictionary with metric names as keys and their corresponding scores as values.

Return type:

dict

fit(X, y, val_size=0.2, X_val=None, y_val=None, embeddings=None, embeddings_val=None, max_epochs=100, random_state=101, batch_size=128, shuffle=True, patience=15, monitor='val_loss', mode='min', lr=None, lr_patience=None, lr_factor=None, weight_decay=None, checkpoint_path='model_checkpoints', dataloader_kwargs={}, train_metrics=None, val_metrics=None, rebuild=True, **trainer_kwargs)

Trains the regression model using the provided training data. Optionally, a separate validation set can be used.

Parameters:
  • X (DataFrame or array-like, shape (n_samples, n_features)) – The training input samples.

  • y (array-like, shape (n_samples,) or (n_samples, n_targets)) – The target values (real numbers).

  • val_size (float) – The proportion of the dataset to include in the validation split if X_val is None. Ignored if X_val is provided.

  • X_val (DataFrame or array-like, shape (n_samples, n_features), optional) – The validation input samples. If provided, X and y are not split and this data is used for validation.

  • y_val (array-like, shape (n_samples,) or (n_samples, n_targets), optional) – The validation target values. Required if X_val is provided.

  • max_epochs (int) – Maximum number of epochs for training.

  • random_state (int) – Controls the shuffling applied to the data before applying the split.

  • batch_size (int) – Number of samples per gradient update.

  • shuffle (bool) – Whether to shuffle the training data before each epoch.

  • patience (int) – Number of epochs with no improvement on the validation loss to wait before early stopping.

  • monitor (str) – The metric to monitor for early stopping.

  • mode (str) – Whether the monitored metric should be minimized (min) or maximized (max).

  • lr (float | None) – Learning rate for the optimizer.

  • lr_patience (int | None) – Number of epochs with no improvement on the validation loss to wait before reducing the learning rate.

  • factor (float, default=0.1) – Factor by which the learning rate will be reduced.

  • weight_decay (float | None) – Weight decay (L2 penalty) coefficient.

  • checkpoint_path (str, default="model_checkpoints") – Path where the checkpoints are being saved.

  • dataloader_kwargs (dict, default={}) – The kwargs for the pytorch dataloader class.

  • train_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during training.

  • val_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during validation.

  • rebuild (bool, default=True) – Whether to rebuild the model when it already was built.

  • **trainer_kwargs (Additional keyword arguments for PyTorch Lightning's Trainer class.)

Returns:

self – The fitted regressor.

Return type:

object

get_number_of_params(requires_grad=True)

Calculate the number of parameters in the model.

Parameters:

requires_grad (bool, optional) – If True, only count the parameters that require gradients (trainable parameters). If False, count all parameters. Default is True.

Returns:

The total number of parameters in the model.

Return type:

int

Raises:

ValueError – If the model has not been built prior to calling this method.

get_params(deep=True)

Get parameters for this estimator.

classmethod load(path)

Load and return a fitted model from path.

Parameters:

path (str) – Path to a file previously written by save().

Returns:

A fully reconstructed, ready-to-predict estimator of the same type that was saved.

Return type:

estimator

optimize_hparams(X, y, X_val=None, y_val=None, embeddings=None, embeddings_val=None, time=100, max_epochs=200, prune_by_epoch=True, prune_epoch=5, fixed_params={'cat_encoding': 'int', 'head_layer_size_length': 0, 'head_skip_layer': False, 'head_skip_layers': False, 'pooling_method': 'avg', 'use_cls': False}, custom_search_space=None, **optimize_kwargs)

Optimizes hyperparameters using Bayesian optimization with optional pruning.

Parameters:
  • X (array-like) – Training data.

  • y (array-like) – Training labels.

  • X_val (array-like, optional) – Validation data and labels.

  • y_val (array-like, optional) – Validation data and labels.

  • time (int) – The number of optimization trials to run.

  • max_epochs (int) – Maximum number of epochs for training.

  • prune_by_epoch (bool) – Whether to prune based on a specific epoch (True) or the best validation loss (False).

  • prune_epoch (int) – The specific epoch to prune by when prune_by_epoch is True.

  • **optimize_kwargs (dict) – Additional keyword arguments passed to the fit method.

Returns:

best_hparams – Best hyperparameters found during optimization.

Return type:

list

predict(X, embeddings=None, device=None)

Predicts target values for the given input samples.

Parameters:

X (DataFrame or array-like, shape (n_samples, n_features)) – The input samples for which to predict target values.

Returns:

predictions – The predicted target values.

Return type:

ndarray, shape (n_samples,) or (n_samples, n_outputs)

pretrain(pretrain_epochs=15, k_neighbors=10, temperature=0.1, save_path='pretrained_embeddings.pth', lr=0.001, use_positive=True, use_negative=False, pool_sequence=True)

Pretrains the embedding layer of the model using a contrastive learning approach.

This method performs pretraining by optimizing the embeddings with respect to neighborhood structure in the feature space. The embeddings are saved after training.

Parameters:
  • pretrain_epochs (int, default=15) – Number of epochs to run pretraining.

  • k_neighbors (int, default=10) – Number of neighbors used in the contrastive loss computation.

  • temperature (float, default=0.1) – Temperature parameter for contrastive loss scaling.

  • save_path (str, default="pretrained_embeddings.pth") – Path to save the pretrained embeddings.

  • lr (float, default=1e-3) – Learning rate for the pretraining optimizer.

  • use_positive (bool, default=True) – Whether to include positive pairs in contrastive learning.

  • use_negative (bool, default=False) – Whether to include negative pairs in contrastive learning.

  • pool_sequence (bool, default=True) – Whether to apply sequence pooling before computing contrastive loss.

Raises:
  • ValueError – If the model has not been built before calling this method.

  • ValueError – If the model does not contain an embedding layer.

Notes

  • This function requires that self.build_model() has been called beforehand.

  • The pretraining method uses self.task_model.estimator.embedding_layer.

  • The method invokes super()._pretrain() with regression mode enabled.

save(path)

Save the fitted model to path.

The bundle written by this method can be restored with load(). It contains all state required for inference: the config, the fitted preprocessor, feature metadata, and the neural-network weights.

Parameters:

path (str) – Destination file path (e.g. "model.pt").

Raises:

ValueError – If the model has not been fitted yet.

Return type:

None

score(X, y, embeddings=None, metric=sklearn.metrics.mean_squared_error)

Calculate the score of the model using the specified metric.

Parameters:
  • X (array-like or pd.DataFrame of shape (n_samples, n_features)) – The input samples to predict.

  • y (array-like of shape (n_samples,) or (n_samples, n_outputs)) – The true target values against which to evaluate the predictions.

  • metric (callable, default=mean_squared_error) – The metric function to use for evaluation. Must be a callable with the signature metric(y_true, y_pred).

Returns:

score – The score calculated using the specified metric.

Return type:

float

set_params(**parameters)

Set the parameters of this estimator.

class deeptab.models.SAINTLSS(*args: Any, **kwargs: Any)[source]
SAINT for distributional regression.

This class extends the SklearnBaseLSS class and uses the SAINT model with the default SAINT configuration.

Notes

The parameters for this class include the attributes from the config dataclass as well as preprocessing arguments handled by the base class.

Configuration class for the SAINT model with predefined hyperparameters.

Parameters:
  • n_layers (int, default=4) – Number of transformer layers.

  • n_heads (int, default=8) – Number of attention heads in the transformer.

  • d_model (int, default=128) – Dimensionality of embeddings or model representations.

  • attn_dropout (float, default=0.2) – Dropout rate for the attention mechanism.

  • ff_dropout (float, default=0.1) – Dropout rate for the feed-forward layers.

  • norm (str, default="LayerNorm") – Type of normalization to be used (‘LayerNorm’, ‘RMSNorm’, etc.).

  • activation (callable, default=nn.SELU()) – Activation function for the transformer layers.

  • transformer_activation (callable, default=ReGLU()) – Activation function for the transformer feed-forward layers.

  • transformer_dim_feedforward (int, default=256) – Dimensionality of the feed-forward layers in the transformer.

  • norm_first (bool, default=False) – Whether to apply normalization before other operations in each transformer block.

  • bias (bool, default=True) – Whether to use bias in linear layers.

  • head_layer_sizes (list, default=()) – Sizes of the fully connected layers in the model’s head.

  • head_dropout (float, default=0.5) – Dropout rate for the head layers.

  • head_skip_layers (bool, default=False) – Whether to use skip connections in the head layers.

  • head_activation (callable, default=nn.SELU()) – Activation function for the head layers.

  • head_use_batch_norm (bool, default=False) – Whether to use batch normalization in the head layers.

  • pooling_method (str, default="avg") – Pooling method to be used (‘cls’, ‘avg’, etc.).

  • use_cls (bool, default=False) – Whether to use a CLS token for pooling.

  • cat_encoding (str, default="int") – Method for encoding categorical features (‘int’, ‘one-hot’, or ‘linear’).

  • feature_preprocessing (dict, optional) – Dictionary mapping feature names to specific preprocessing methods. Overrides global defaults.

  • n_bins (int, default=64) – Number of bins used for binning-based preprocessing (e.g., for discretizers or PLE).

  • numerical_preprocessing (str, default="ple") – Preprocessing method for numerical features (e.g., “standardization”, “minmax”, “ple”, “rbf”, etc.).

  • categorical_preprocessing (str, default="int") – Preprocessing method for categorical features (e.g., “int”, “ordinal”, “onehot”).

  • use_decision_tree_bins (bool, default=False) – Whether to use decision tree binning for numerical discretization.

  • binning_strategy (str, default="uniform") – Strategy for bin placement when not using tree-based methods. Options: “uniform”, “quantile”.

  • task (str, default="regression") – Problem type used to guide preprocessing (e.g., “regression” or “classification”).

  • cat_cutoff (float or int, default=0.03) – Threshold to determine whether integer-valued features are treated as categorical.

  • treat_all_integers_as_numerical (bool, default=False) – If True, treat all integer-typed columns as numerical regardless of cardinality.

  • degree (int, default=3) – Degree of polynomial or spline basis functions where applicable.

  • scaling_strategy (str, default="minmax") – Strategy for feature scaling (e.g., “standardization”, “minmax”, etc.).

  • n_knots (int, default=64) – Number of knots used in spline-based feature expansions.

  • use_decision_tree_knots (bool, default=True) – Whether to use decision tree-based knot placement for spline transformations.

  • knots_strategy (str, default="uniform") – Strategy for placing knots for splines (“uniform” or “quantile”).

  • spline_implementation (str, default="sklearn") – Which spline backend implementation to use (e.g., “sklearn”, “custom”).

  • min_unique_vals (int, default=5) – Minimum number of unique values required for a feature to be treated as numerical.

Examples

>>> from deeptab.models import SAINTLSS
>>> model = SAINTLSS(d_model=64, n_layers=8)
>>> model.fit(X_train, y_train, family="normal")
>>> preds = model.predict(X_test)
>>> model.evaluate(X_test, y_test)
build_model(X, y, val_size=0.2, X_val=None, y_val=None, random_state=101, batch_size=128, shuffle=True, lr=None, lr_patience=None, lr_factor=None, weight_decay=None, train_metrics=None, val_metrics=None, dataloader_kwargs={})

Builds the model using the provided training data.

Parameters:
  • X (DataFrame or array-like, shape (n_samples, n_features)) – The training input samples.

  • y (array-like, shape (n_samples,) or (n_samples, n_targets)) – The target values (real numbers).

  • val_size (float) – The proportion of the dataset to include in the validation split if X_val is None. Ignored if X_val is provided.

  • X_val (DataFrame or array-like, shape (n_samples, n_features), optional) – The validation input samples. If provided, X and y are not split and this data is used for validation.

  • y_val (array-like, shape (n_samples,) or (n_samples, n_targets), optional) – The validation target values. Required if X_val is provided.

  • random_state (int) – Controls the shuffling applied to the data before applying the split.

  • batch_size (int) – Number of samples per gradient update.

  • shuffle (bool) – Whether to shuffle the training data before each epoch.

  • lr (float | None) – Learning rate for the optimizer.

  • lr_patience (int | None) – Number of epochs with no improvement on the validation loss to wait before reducing the learning rate.

  • lr_factor (float | None) – Factor by which the learning rate will be reduced.

  • train_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during training.

  • val_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during validation.

  • weight_decay (float | None) – Weight decay (L2 penalty) coefficient.

  • dataloader_kwargs (dict, default={}) – The kwargs for the pytorch dataloader class.

Returns:

self – The built distributional regressor.

Return type:

object

encode(X, batch_size=64)

Encodes input data using the trained model’s embedding layer.

Parameters:
  • X (array-like or DataFrame) – Input data to be encoded.

  • batch_size (int, optional, default=64) – Batch size for encoding.

Returns:

Encoded representations of the input data.

Return type:

torch.Tensor

Raises:

ValueError – If the model or data module is not fitted.

evaluate(X, y_true, metrics=None, distribution_family=None)

Evaluate the model on the given data using specified metrics.

Parameters:
  • X (array-like or pd.DataFrame of shape (n_samples, n_features)) – The input samples to predict.

  • y_true (array-like of shape (n_samples,)) – The true class labels against which to evaluate the predictions.

  • metrics (dict) – A dictionary where keys are metric names and values are tuples containing the metric function and a boolean indicating whether the metric requires probability scores (True) or class labels (False).

  • distribution_family (str, optional) – Specifies the distribution family the model is predicting for. If None, it will attempt to infer based on the model’s settings.

Returns:

scores – A dictionary with metric names as keys and their corresponding scores as values.

Return type:

dict

Notes

This method uses either the predict or predict_proba method depending on the metric requirements.

fit(X, y, family, val_size=0.2, X_val=None, y_val=None, max_epochs=100, random_state=101, batch_size=128, shuffle=True, patience=15, monitor='val_loss', mode='min', lr=None, lr_patience=None, lr_factor=None, weight_decay=None, checkpoint_path='model_checkpoints', distributional_kwargs=None, train_metrics=None, val_metrics=None, dataloader_kwargs={}, rebuild=True, **trainer_kwargs)

Trains the regression model using the provided training data. Optionally, a separate validation set can be used.

Parameters:
  • X (DataFrame or array-like, shape (n_samples, n_features)) – The training input samples.

  • y (array-like, shape (n_samples,) or (n_samples, n_targets)) – The target values (real numbers).

  • family (str) – The name of the distribution family to use for the loss function. Examples include ‘normal’ for regression tasks.

  • val_size (float) – The proportion of the dataset to include in the validation split if X_val is None. Ignored if X_val is provided.

  • X_val (DataFrame or array-like, shape (n_samples, n_features), optional) – The validation input samples. If provided, X and y are not split and this data is used for validation.

  • y_val (array-like, shape (n_samples,) or (n_samples, n_targets), optional) – The validation target values. Required if X_val is provided.

  • max_epochs (int) – Maximum number of epochs for training.

  • random_state (int) – Controls the shuffling applied to the data before applying the split.

  • batch_size (int) – Number of samples per gradient update.

  • shuffle (bool) – Whether to shuffle the training data before each epoch.

  • patience (int) – Number of epochs with no improvement on the validation loss to wait before early stopping.

  • monitor (str) – The metric to monitor for early stopping.

  • mode (str) – Whether the monitored metric should be minimized (min) or maximized (max).

  • lr (float | None) – Learning rate for the optimizer.

  • lr_patience (int | None) – Number of epochs with no improvement on the validation loss to wait before reducing the learning rate.

  • factor (float, default=0.1) – Factor by which the learning rate will be reduced.

  • weight_decay (float | None) – Weight decay (L2 penalty) coefficient.

  • distributional_kwargs (dict, default=None) – any arguments taht are specific for a certain distribution.

  • train_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during training.

  • val_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during validation.

  • checkpoint_path (str, default="model_checkpoints") – Path where the checkpoints are being saved.

  • dataloader_kwargs (dict, default={}) – The kwargs for the pytorch dataloader class.

  • **trainer_kwargs (Additional keyword arguments for PyTorch Lightning's Trainer class.)

Returns:

self – The fitted regressor.

Return type:

object

get_default_metrics(distribution_family)

Provides default metrics based on the distribution family.

Parameters:

distribution_family (str) – The distribution family for which to provide default metrics.

Returns:

metrics – A dictionary of default metric functions.

Return type:

dict

get_number_of_params(requires_grad=True)

Calculate the number of parameters in the model.

Parameters:

requires_grad (bool, optional) – If True, only count the parameters that require gradients (trainable parameters). If False, count all parameters. Default is True.

Returns:

The total number of parameters in the model.

Return type:

int

Raises:

ValueError – If the model has not been built prior to calling this method.

get_params(deep=True)

Get parameters for this estimator.

Parameters:

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

params – Parameter names mapped to their values.

Return type:

dict

classmethod load(path)

Load and return a fitted model from path.

Parameters:

path (str) – Path to a file previously written by save().

Returns:

A fully reconstructed, ready-to-predict estimator.

Return type:

estimator

optimize_hparams(X, y, X_val=None, y_val=None, time=100, max_epochs=200, prune_by_epoch=True, prune_epoch=5, fixed_params={'cat_encoding': 'int', 'head_layer_size_length': 0, 'head_skip_layer': False, 'head_skip_layers': False, 'pooling_method': 'avg', 'use_cls': False}, custom_search_space=None, **optimize_kwargs)

Optimizes hyperparameters using Bayesian optimization with optional pruning.

Parameters:
  • X (array-like) – Training data.

  • y (array-like) – Training labels.

  • X_val (array-like, optional) – Validation data and labels.

  • y_val (array-like, optional) – Validation data and labels.

  • time (int) – The number of optimization trials to run.

  • max_epochs (int) – Maximum number of epochs for training.

  • prune_by_epoch (bool) – Whether to prune based on a specific epoch (True) or the best validation loss (False).

  • prune_epoch (int) – The specific epoch to prune by when prune_by_epoch is True.

  • **optimize_kwargs (dict) – Additional keyword arguments passed to the fit method.

Returns:

best_hparams – Best hyperparameters found during optimization.

Return type:

list

predict(X, raw=False, device=None)

Predicts target values for the given input samples.

Parameters:

X (DataFrame or array-like, shape (n_samples, n_features)) – The input samples for which to predict target values.

Returns:

predictions – The predicted target values.

Return type:

ndarray, shape (n_samples,) or (n_samples, n_outputs)

save(path)

Save the fitted model to path.

Parameters:

path (str) – Destination file path (e.g. "model.pt").

Raises:

ValueError – If the model has not been fitted yet.

Return type:

None

score(X, y, metric='NLL')

Calculate the score of the model using the specified metric.

Parameters:
  • X (array-like or pd.DataFrame of shape (n_samples, n_features)) – The input samples to predict.

  • y (array-like of shape (n_samples,) or (n_samples, n_outputs)) – The true target values against which to evaluate the predictions.

  • metric (str, default="NLL") – So far, only negative log-likelihood is supported

Returns:

score – The score calculated using the specified metric.

Return type:

float

set_params(**parameters)

Set the parameters of this estimator.

Parameters:

**parameters (dict) – Estimator parameters.

Returns:

self – Estimator instance.

Return type:

object

class deeptab.models.AutoIntClassifier(*args: Any, **kwargs: Any)[source]
AutoInt Classifier. This class extends the SklearnBaseClassifier class

and uses the AutoInt model with the default AutoInt configuration.

Notes

The parameters for this class include the attributes from the config dataclass as well as preprocessing arguments handled by the base class.

Configuration class for the AutoInt model with predefined hyperparameters.

Parameters:
  • d_model (int, default=128) – Dimensionality of the transformer model.

  • n_layers (int, default=4) – Number of transformer layers.

  • n_heads (int, default=8) – Number of attention heads in the transformer.

  • attn_dropout (float, default=0.2) – Dropout rate for the attention mechanism.

  • transformer_dim_feedforward (int, default=256) – Dimensionality of the feed-forward layers in the transformer.

  • prenorm (bool, default=False) – Whether to apply normalization before last layer.

  • bias (bool, default=True) – Whether to use bias in linear layers.

  • cat_encoding (str, default="int") – Method for encoding categorical features (‘int’, ‘one-hot’, or ‘linear’).

  • kv_compression (float, default=0.5) – Compression ratio for key-value pairs.

  • kv_compression_sharing (str, default='key-value') – Sharing strategy for key-value compression (‘headwise’, or ‘key-value’).

  • feature_preprocessing (dict, optional) – Dictionary mapping feature names to specific preprocessing methods. Overrides global defaults.

  • n_bins (int, default=64) – Number of bins used for binning-based preprocessing (e.g., for discretizers or PLE).

  • numerical_preprocessing (str, default="ple") – Preprocessing method for numerical features (e.g., “standardization”, “minmax”, “ple”, “rbf”, etc.).

  • categorical_preprocessing (str, default="int") – Preprocessing method for categorical features (e.g., “int”, “ordinal”, “onehot”).

  • use_decision_tree_bins (bool, default=False) – Whether to use decision tree binning for numerical discretization.

  • binning_strategy (str, default="uniform") – Strategy for bin placement when not using tree-based methods. Options: “uniform”, “quantile”.

  • task (str, default="regression") – Problem type used to guide preprocessing (e.g., “regression” or “classification”).

  • cat_cutoff (float or int, default=0.03) – Threshold to determine whether integer-valued features are treated as categorical.

  • treat_all_integers_as_numerical (bool, default=False) – If True, treat all integer-typed columns as numerical regardless of cardinality.

  • degree (int, default=3) – Degree of polynomial or spline basis functions where applicable.

  • scaling_strategy (str, default="minmax") – Strategy for feature scaling (e.g., “standardization”, “minmax”, etc.).

  • n_knots (int, default=64) – Number of knots used in spline-based feature expansions.

  • use_decision_tree_knots (bool, default=True) – Whether to use decision tree-based knot placement for spline transformations.

  • knots_strategy (str, default="uniform") – Strategy for placing knots for splines (“uniform” or “quantile”).

  • spline_implementation (str, default="sklearn") – Which spline backend implementation to use (e.g., “sklearn”, “custom”).

  • min_unique_vals (int, default=5) – Minimum number of unique values required for a feature to be treated as numerical.

Examples

>>> from deeptab.models import AutoIntClassifier
>>> model = AutoIntClassifier(d_model=64, n_layers=8)
>>> model.fit(X_train, y_train)
>>> preds = model.predict(X_test)
>>> model.evaluate(X_test, y_test)
build_model(X, y, val_size=0.2, X_val=None, y_val=None, embeddings=None, embeddings_val=None, random_state=101, batch_size=128, shuffle=True, lr=None, lr_patience=None, lr_factor=None, weight_decay=None, train_metrics=None, val_metrics=None, dataloader_kwargs={})

Builds the model using the provided training data.

Parameters:
  • X (DataFrame or array-like, shape (n_samples, n_features)) – The training input samples.

  • y (array-like, shape (n_samples,) or (n_samples, n_targets)) – The target values (real numbers).

  • val_size (float) – The proportion of the dataset to include in the validation split if X_val is None. Ignored if X_val is provided.

  • X_val (DataFrame or array-like, shape (n_samples, n_features), optional) – The validation input samples. If provided, X and y are not split and this data is used for validation.

  • y_val (array-like, shape (n_samples,) or (n_samples, n_targets), optional) – The validation target values. Required if X_val is provided.

  • random_state (int) – Controls the shuffling applied to the data before applying the split.

  • batch_size (int) – Number of samples per gradient update.

  • shuffle (bool) – Whether to shuffle the training data before each epoch.

  • lr (float | None) – Learning rate for the optimizer.

  • lr_patience (int | None) – Number of epochs with no improvement on the validation loss to wait before reducing the learning rate.

  • lr_factor (float | None) – Factor by which the learning rate will be reduced.

  • train_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during training.

  • val_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during validation.

  • weight_decay (float | None) – Weight decay (L2 penalty) coefficient.

  • dataloader_kwargs (dict, default={}) – The kwargs for the pytorch dataloader class.

Returns:

self – The built classifier.

Return type:

object

encode(X, embeddings=None, batch_size=64)

Encodes input data using the trained model’s embedding layer.

Parameters:
  • X (array-like or DataFrame) – Input data to be encoded.

  • batch_size (int, optional, default=64) – Batch size for encoding.

Returns:

Encoded representations of the input data.

Return type:

torch.Tensor

Raises:

ValueError – If the model or data module is not fitted.

evaluate(X, y_true, embeddings=None, metrics=None)

Evaluate the model on the given data using specified metrics.

Parameters:
  • X (array-like or pd.DataFrame of shape (n_samples, n_features)) – The input samples to predict.

  • y_true (array-like of shape (n_samples,)) – The true class labels against which to evaluate the predictions.

  • embneddings (array-like or list of shape(n_samples, dimension)) – List or array with embeddings for unstructured data inputs

  • metrics (dict) – A dictionary where keys are metric names and values are tuples containing the metric function and a boolean indicating whether the metric requires probability scores (True) or class labels (False).

Returns:

scores – A dictionary with metric names as keys and their corresponding scores as values.

Return type:

dict

Notes

This method uses either the predict or predict_proba method depending on the metric requirements.

fit(X, y, val_size=0.2, X_val=None, y_val=None, embeddings=None, embeddings_val=None, max_epochs=100, random_state=101, batch_size=128, shuffle=True, patience=15, monitor='val_loss', mode='min', lr=None, lr_patience=None, lr_factor=None, weight_decay=None, checkpoint_path='model_checkpoints', train_metrics=None, val_metrics=None, dataloader_kwargs={}, rebuild=True, **trainer_kwargs)

Trains the classification model using the provided training data. Optionally, a separate validation set can be used.

Parameters:
  • X (DataFrame or array-like, shape (n_samples, n_features)) – The training input samples.

  • y (array-like, shape (n_samples,) or (n_samples, n_targets)) – The target values (real numbers).

  • val_size (float) – The proportion of the dataset to include in the validation split if X_val is None. Ignored if X_val is provided.

  • X_val (DataFrame or array-like, shape (n_samples, n_features), optional) – The validation input samples. If provided, X and y are not split and this data is used for validation.

  • y_val (array-like, shape (n_samples,) or (n_samples, n_targets), optional) – The validation target values. Required if X_val is provided.

  • max_epochs (int) – Maximum number of epochs for training.

  • random_state (int) – Controls the shuffling applied to the data before applying the split.

  • batch_size (int) – Number of samples per gradient update.

  • shuffle (bool) – Whether to shuffle the training data before each epoch.

  • patience (int) – Number of epochs with no improvement on the validation loss to wait before early stopping.

  • monitor (str) – The metric to monitor for early stopping.

  • mode (str) – Whether the monitored metric should be minimized (min) or maximized (max).

  • lr (float | None) – Learning rate for the optimizer.

  • lr_patience (int | None) – Number of epochs with no improvement on the validation loss to wait before reducing the learning rate.

  • factor (float, default=0.1) – Factor by which the learning rate will be reduced.

  • weight_decay (float | None) – Weight decay (L2 penalty) coefficient.

  • checkpoint_path (str, default="model_checkpoints") – Path where the checkpoints are being saved.

  • train_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during training.

  • val_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during validation.

  • dataloader_kwargs (dict, default={}) – The kwargs for the pytorch dataloader class.

  • rebuild (bool, default=True) – Whether to rebuild the model when it already was built.

  • **trainer_kwargs (Additional keyword arguments for PyTorch Lightning's Trainer class.)

Returns:

self – The fitted classifier.

Return type:

object

get_number_of_params(requires_grad=True)

Calculate the number of parameters in the model.

Parameters:

requires_grad (bool, optional) – If True, only count the parameters that require gradients (trainable parameters). If False, count all parameters. Default is True.

Returns:

The total number of parameters in the model.

Return type:

int

Raises:

ValueError – If the model has not been built prior to calling this method.

get_params(deep=True)

Get parameters for this estimator.

classmethod load(path)

Load and return a fitted model from path.

Parameters:

path (str) – Path to a file previously written by save().

Returns:

A fully reconstructed, ready-to-predict estimator of the same type that was saved.

Return type:

estimator

optimize_hparams(X, y, X_val=None, y_val=None, embeddings=None, embeddings_val=None, time=100, max_epochs=200, prune_by_epoch=True, prune_epoch=5, fixed_params={'cat_encoding': 'int', 'head_layer_size_length': 0, 'head_skip_layer': False, 'head_skip_layers': False, 'pooling_method': 'avg', 'use_cls': False}, custom_search_space=None, **optimize_kwargs)

Optimizes hyperparameters using Bayesian optimization with optional pruning.

Parameters:
  • X (array-like) – Training data.

  • y (array-like) – Training labels.

  • X_val (array-like, optional) – Validation data and labels.

  • y_val (array-like, optional) – Validation data and labels.

  • time (int) – The number of optimization trials to run.

  • max_epochs (int) – Maximum number of epochs for training.

  • prune_by_epoch (bool) – Whether to prune based on a specific epoch (True) or the best validation loss (False).

  • prune_epoch (int) – The specific epoch to prune by when prune_by_epoch is True.

  • **optimize_kwargs (dict) – Additional keyword arguments passed to the fit method.

Returns:

best_hparams – Best hyperparameters found during optimization.

Return type:

list

predict(X, embeddings=None, device=None)

Predicts target labels for the given input samples.

Parameters:

X (DataFrame or array-like, shape (n_samples, n_features)) – The input samples for which to predict target values.

Returns:

predictions – The predicted class labels.

Return type:

ndarray, shape (n_samples,)

predict_proba(X, embeddings=None, device=None)

Predicts class probabilities for the given input samples.

Parameters:

X (DataFrame or array-like, shape (n_samples, n_features)) – The input samples for which to predict class probabilities.

Returns:

probabilities – The predicted class probabilities.

Return type:

ndarray, shape (n_samples, n_classes)

pretrain(pretrain_epochs=15, k_neighbors=10, temperature=0.1, save_path='pretrained_embeddings.pth', lr=0.001, use_positive=True, use_negative=False, pool_sequence=True)

Pretrains the embedding layer of the model using a contrastive learning approach.

This method performs pretraining by optimizing the embeddings with respect to neighborhood structure in the feature space. The embeddings are saved after training.

Parameters:
  • pretrain_epochs (int, default=15) – Number of epochs to run pretraining.

  • k_neighbors (int, default=10) – Number of neighbors used in the contrastive loss computation.

  • temperature (float, default=0.1) – Temperature parameter for contrastive loss scaling.

  • save_path (str, default="pretrained_embeddings.pth") – Path to save the pretrained embeddings.

  • lr (float, default=1e-3) – Learning rate for the pretraining optimizer.

  • use_positive (bool, default=True) – Whether to include positive pairs in contrastive learning.

  • use_negative (bool, default=False) – Whether to include negative pairs in contrastive learning.

  • pool_sequence (bool, default=True) – Whether to apply sequence pooling before computing contrastive loss.

Raises:
  • ValueError – If the model has not been built before calling this method.

  • ValueError – If the model does not contain an embedding layer.

Notes

  • This function requires that self.build_model() has been called beforehand.

  • The pretraining method uses self.task_model.estimator.embedding_layer.

  • The method invokes super()._pretrain() with regression mode enabled.

save(path)

Save the fitted model to path.

The bundle written by this method can be restored with load(). It contains all state required for inference: the config, the fitted preprocessor, feature metadata, and the neural-network weights.

Parameters:

path (str) – Destination file path (e.g. "model.pt").

Raises:

ValueError – If the model has not been fitted yet.

Return type:

None

score(X, y, embeddings=None, metric=(sklearn.metrics.log_loss, True))

Calculate the score of the model using the specified metric.

Parameters:
  • X (array-like or pd.DataFrame of shape (n_samples, n_features)) – The input samples to predict.

  • y (array-like of shape (n_samples,)) – The true class labels against which to evaluate the predictions.

  • metric (tuple, default=(log_loss, True)) – A tuple containing the metric function and a boolean indicating whether the metric requires probability scores (True) or class labels (False).

Returns:

score – The score calculated using the specified metric.

Return type:

float

set_params(**parameters)

Set the parameters of this estimator.

class deeptab.models.AutoIntRegressor(*args: Any, **kwargs: Any)[source]

AutoInt regressor. This class extends the SklearnBaseRegressor class and uses the AutoInt model with the default AutoInt configuration.

Notes

The parameters for this class include the attributes from the config dataclass as well as preprocessing arguments handled by the base class.

Configuration class for the AutoInt model with predefined hyperparameters.

Parameters:
  • d_model (int, default=128) – Dimensionality of the transformer model.

  • n_layers (int, default=4) – Number of transformer layers.

  • n_heads (int, default=8) – Number of attention heads in the transformer.

  • attn_dropout (float, default=0.2) – Dropout rate for the attention mechanism.

  • transformer_dim_feedforward (int, default=256) – Dimensionality of the feed-forward layers in the transformer.

  • prenorm (bool, default=False) – Whether to apply normalization before last layer.

  • bias (bool, default=True) – Whether to use bias in linear layers.

  • cat_encoding (str, default="int") – Method for encoding categorical features (‘int’, ‘one-hot’, or ‘linear’).

  • kv_compression (float, default=0.5) – Compression ratio for key-value pairs.

  • kv_compression_sharing (str, default='key-value') – Sharing strategy for key-value compression (‘headwise’, or ‘key-value’).

  • feature_preprocessing (dict, optional) – Dictionary mapping feature names to specific preprocessing methods. Overrides global defaults.

  • n_bins (int, default=64) – Number of bins used for binning-based preprocessing (e.g., for discretizers or PLE).

  • numerical_preprocessing (str, default="ple") – Preprocessing method for numerical features (e.g., “standardization”, “minmax”, “ple”, “rbf”, etc.).

  • categorical_preprocessing (str, default="int") – Preprocessing method for categorical features (e.g., “int”, “ordinal”, “onehot”).

  • use_decision_tree_bins (bool, default=False) – Whether to use decision tree binning for numerical discretization.

  • binning_strategy (str, default="uniform") – Strategy for bin placement when not using tree-based methods. Options: “uniform”, “quantile”.

  • task (str, default="regression") – Problem type used to guide preprocessing (e.g., “regression” or “classification”).

  • cat_cutoff (float or int, default=0.03) – Threshold to determine whether integer-valued features are treated as categorical.

  • treat_all_integers_as_numerical (bool, default=False) – If True, treat all integer-typed columns as numerical regardless of cardinality.

  • degree (int, default=3) – Degree of polynomial or spline basis functions where applicable.

  • scaling_strategy (str, default="minmax") – Strategy for feature scaling (e.g., “standardization”, “minmax”, etc.).

  • n_knots (int, default=64) – Number of knots used in spline-based feature expansions.

  • use_decision_tree_knots (bool, default=True) – Whether to use decision tree-based knot placement for spline transformations.

  • knots_strategy (str, default="uniform") – Strategy for placing knots for splines (“uniform” or “quantile”).

  • spline_implementation (str, default="sklearn") – Which spline backend implementation to use (e.g., “sklearn”, “custom”).

  • min_unique_vals (int, default=5) – Minimum number of unique values required for a feature to be treated as numerical.

Examples

>>> from deeptab.models import AutoIntRegressor
>>> model = AutoIntRegressor(d_model=64, n_layers=8)
>>> model.fit(X_train, y_train)
>>> preds = model.predict(X_test)
>>> model.evaluate(X_test, y_test)
build_model(X, y, val_size=0.2, X_val=None, y_val=None, embeddings=None, embeddings_val=None, random_state=101, batch_size=128, shuffle=True, lr=None, lr_patience=None, lr_factor=None, weight_decay=None, train_metrics=None, val_metrics=None, dataloader_kwargs={})

Builds the model using the provided training data.

Parameters:
  • X (DataFrame or array-like, shape (n_samples, n_features)) – The training input samples.

  • y (array-like, shape (n_samples,) or (n_samples, n_targets)) – The target values (real numbers).

  • val_size (float) – The proportion of the dataset to include in the validation split if X_val is None. Ignored if X_val is provided.

  • X_val (DataFrame or array-like, shape (n_samples, n_features), optional) – The validation input samples. If provided, X and y are not split and this data is used for validation.

  • y_val (array-like, shape (n_samples,) or (n_samples, n_targets), optional) – The validation target values. Required if X_val is provided.

  • random_state (int) – Controls the shuffling applied to the data before applying the split.

  • batch_size (int) – Number of samples per gradient update.

  • shuffle (bool) – Whether to shuffle the training data before each epoch.

  • lr (float | None) – Learning rate for the optimizer.

  • lr_patience (int | None) – Number of epochs with no improvement on the validation loss to wait before reducing the learning rate.

  • factor (float, default=0.1) – Factor by which the learning rate will be reduced.

  • weight_decay (float | None) – Weight decay (L2 penalty) coefficient.

  • train_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during training.

  • val_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during validation.

  • dataloader_kwargs (dict, default={}) – The kwargs for the pytorch dataloader class.

Returns:

self – The built regressor.

Return type:

object

encode(X, embeddings=None, batch_size=64)

Encodes input data using the trained model’s embedding layer.

Parameters:
  • X (array-like or DataFrame) – Input data to be encoded.

  • batch_size (int, optional, default=64) – Batch size for encoding.

Returns:

Encoded representations of the input data.

Return type:

torch.Tensor

Raises:

ValueError – If the model or data module is not fitted.

evaluate(X, y_true, embeddings=None, metrics=None)

Evaluate the model on the given data using specified metrics.

Parameters:
  • X (array-like or pd.DataFrame of shape (n_samples, n_features)) – The input samples to predict.

  • y_true (array-like of shape (n_samples,) or (n_samples, n_outputs)) – The true target values against which to evaluate the predictions.

  • metrics (dict) – A dictionary where keys are metric names and values are the metric functions.

Notes

This method uses the predict method to generate predictions and computes each metric.

Returns:

scores – A dictionary with metric names as keys and their corresponding scores as values.

Return type:

dict

fit(X, y, val_size=0.2, X_val=None, y_val=None, embeddings=None, embeddings_val=None, max_epochs=100, random_state=101, batch_size=128, shuffle=True, patience=15, monitor='val_loss', mode='min', lr=None, lr_patience=None, lr_factor=None, weight_decay=None, checkpoint_path='model_checkpoints', dataloader_kwargs={}, train_metrics=None, val_metrics=None, rebuild=True, **trainer_kwargs)

Trains the regression model using the provided training data. Optionally, a separate validation set can be used.

Parameters:
  • X (DataFrame or array-like, shape (n_samples, n_features)) – The training input samples.

  • y (array-like, shape (n_samples,) or (n_samples, n_targets)) – The target values (real numbers).

  • val_size (float) – The proportion of the dataset to include in the validation split if X_val is None. Ignored if X_val is provided.

  • X_val (DataFrame or array-like, shape (n_samples, n_features), optional) – The validation input samples. If provided, X and y are not split and this data is used for validation.

  • y_val (array-like, shape (n_samples,) or (n_samples, n_targets), optional) – The validation target values. Required if X_val is provided.

  • max_epochs (int) – Maximum number of epochs for training.

  • random_state (int) – Controls the shuffling applied to the data before applying the split.

  • batch_size (int) – Number of samples per gradient update.

  • shuffle (bool) – Whether to shuffle the training data before each epoch.

  • patience (int) – Number of epochs with no improvement on the validation loss to wait before early stopping.

  • monitor (str) – The metric to monitor for early stopping.

  • mode (str) – Whether the monitored metric should be minimized (min) or maximized (max).

  • lr (float | None) – Learning rate for the optimizer.

  • lr_patience (int | None) – Number of epochs with no improvement on the validation loss to wait before reducing the learning rate.

  • factor (float, default=0.1) – Factor by which the learning rate will be reduced.

  • weight_decay (float | None) – Weight decay (L2 penalty) coefficient.

  • checkpoint_path (str, default="model_checkpoints") – Path where the checkpoints are being saved.

  • dataloader_kwargs (dict, default={}) – The kwargs for the pytorch dataloader class.

  • train_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during training.

  • val_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during validation.

  • rebuild (bool, default=True) – Whether to rebuild the model when it already was built.

  • **trainer_kwargs (Additional keyword arguments for PyTorch Lightning's Trainer class.)

Returns:

self – The fitted regressor.

Return type:

object

get_number_of_params(requires_grad=True)

Calculate the number of parameters in the model.

Parameters:

requires_grad (bool, optional) – If True, only count the parameters that require gradients (trainable parameters). If False, count all parameters. Default is True.

Returns:

The total number of parameters in the model.

Return type:

int

Raises:

ValueError – If the model has not been built prior to calling this method.

get_params(deep=True)

Get parameters for this estimator.

classmethod load(path)

Load and return a fitted model from path.

Parameters:

path (str) – Path to a file previously written by save().

Returns:

A fully reconstructed, ready-to-predict estimator of the same type that was saved.

Return type:

estimator

optimize_hparams(X, y, X_val=None, y_val=None, embeddings=None, embeddings_val=None, time=100, max_epochs=200, prune_by_epoch=True, prune_epoch=5, fixed_params={'cat_encoding': 'int', 'head_layer_size_length': 0, 'head_skip_layer': False, 'head_skip_layers': False, 'pooling_method': 'avg', 'use_cls': False}, custom_search_space=None, **optimize_kwargs)

Optimizes hyperparameters using Bayesian optimization with optional pruning.

Parameters:
  • X (array-like) – Training data.

  • y (array-like) – Training labels.

  • X_val (array-like, optional) – Validation data and labels.

  • y_val (array-like, optional) – Validation data and labels.

  • time (int) – The number of optimization trials to run.

  • max_epochs (int) – Maximum number of epochs for training.

  • prune_by_epoch (bool) – Whether to prune based on a specific epoch (True) or the best validation loss (False).

  • prune_epoch (int) – The specific epoch to prune by when prune_by_epoch is True.

  • **optimize_kwargs (dict) – Additional keyword arguments passed to the fit method.

Returns:

best_hparams – Best hyperparameters found during optimization.

Return type:

list

predict(X, embeddings=None, device=None)

Predicts target values for the given input samples.

Parameters:

X (DataFrame or array-like, shape (n_samples, n_features)) – The input samples for which to predict target values.

Returns:

predictions – The predicted target values.

Return type:

ndarray, shape (n_samples,) or (n_samples, n_outputs)

pretrain(pretrain_epochs=15, k_neighbors=10, temperature=0.1, save_path='pretrained_embeddings.pth', lr=0.001, use_positive=True, use_negative=False, pool_sequence=True)

Pretrains the embedding layer of the model using a contrastive learning approach.

This method performs pretraining by optimizing the embeddings with respect to neighborhood structure in the feature space. The embeddings are saved after training.

Parameters:
  • pretrain_epochs (int, default=15) – Number of epochs to run pretraining.

  • k_neighbors (int, default=10) – Number of neighbors used in the contrastive loss computation.

  • temperature (float, default=0.1) – Temperature parameter for contrastive loss scaling.

  • save_path (str, default="pretrained_embeddings.pth") – Path to save the pretrained embeddings.

  • lr (float, default=1e-3) – Learning rate for the pretraining optimizer.

  • use_positive (bool, default=True) – Whether to include positive pairs in contrastive learning.

  • use_negative (bool, default=False) – Whether to include negative pairs in contrastive learning.

  • pool_sequence (bool, default=True) – Whether to apply sequence pooling before computing contrastive loss.

Raises:
  • ValueError – If the model has not been built before calling this method.

  • ValueError – If the model does not contain an embedding layer.

Notes

  • This function requires that self.build_model() has been called beforehand.

  • The pretraining method uses self.task_model.estimator.embedding_layer.

  • The method invokes super()._pretrain() with regression mode enabled.

save(path)

Save the fitted model to path.

The bundle written by this method can be restored with load(). It contains all state required for inference: the config, the fitted preprocessor, feature metadata, and the neural-network weights.

Parameters:

path (str) – Destination file path (e.g. "model.pt").

Raises:

ValueError – If the model has not been fitted yet.

Return type:

None

score(X, y, embeddings=None, metric=sklearn.metrics.mean_squared_error)

Calculate the score of the model using the specified metric.

Parameters:
  • X (array-like or pd.DataFrame of shape (n_samples, n_features)) – The input samples to predict.

  • y (array-like of shape (n_samples,) or (n_samples, n_outputs)) – The true target values against which to evaluate the predictions.

  • metric (callable, default=mean_squared_error) – The metric function to use for evaluation. Must be a callable with the signature metric(y_true, y_pred).

Returns:

score – The score calculated using the specified metric.

Return type:

float

set_params(**parameters)

Set the parameters of this estimator.

class deeptab.models.AutoIntLSS(*args: Any, **kwargs: Any)[source]
AutoInt for distributional regression.

This class extends the SklearnBaseLSS class and uses the AutoInt model with the default AutoInt configuration.

Notes

The parameters for this class include the attributes from the config dataclass as well as preprocessing arguments handled by the base class.

Configuration class for the AutoInt model with predefined hyperparameters.

Parameters:
  • d_model (int, default=128) – Dimensionality of the transformer model.

  • n_layers (int, default=4) – Number of transformer layers.

  • n_heads (int, default=8) – Number of attention heads in the transformer.

  • attn_dropout (float, default=0.2) – Dropout rate for the attention mechanism.

  • transformer_dim_feedforward (int, default=256) – Dimensionality of the feed-forward layers in the transformer.

  • prenorm (bool, default=False) – Whether to apply normalization before last layer.

  • bias (bool, default=True) – Whether to use bias in linear layers.

  • cat_encoding (str, default="int") – Method for encoding categorical features (‘int’, ‘one-hot’, or ‘linear’).

  • kv_compression (float, default=0.5) – Compression ratio for key-value pairs.

  • kv_compression_sharing (str, default='key-value') – Sharing strategy for key-value compression (‘headwise’, or ‘key-value’).

  • feature_preprocessing (dict, optional) – Dictionary mapping feature names to specific preprocessing methods. Overrides global defaults.

  • n_bins (int, default=64) – Number of bins used for binning-based preprocessing (e.g., for discretizers or PLE).

  • numerical_preprocessing (str, default="ple") – Preprocessing method for numerical features (e.g., “standardization”, “minmax”, “ple”, “rbf”, etc.).

  • categorical_preprocessing (str, default="int") – Preprocessing method for categorical features (e.g., “int”, “ordinal”, “onehot”).

  • use_decision_tree_bins (bool, default=False) – Whether to use decision tree binning for numerical discretization.

  • binning_strategy (str, default="uniform") – Strategy for bin placement when not using tree-based methods. Options: “uniform”, “quantile”.

  • task (str, default="regression") – Problem type used to guide preprocessing (e.g., “regression” or “classification”).

  • cat_cutoff (float or int, default=0.03) – Threshold to determine whether integer-valued features are treated as categorical.

  • treat_all_integers_as_numerical (bool, default=False) – If True, treat all integer-typed columns as numerical regardless of cardinality.

  • degree (int, default=3) – Degree of polynomial or spline basis functions where applicable.

  • scaling_strategy (str, default="minmax") – Strategy for feature scaling (e.g., “standardization”, “minmax”, etc.).

  • n_knots (int, default=64) – Number of knots used in spline-based feature expansions.

  • use_decision_tree_knots (bool, default=True) – Whether to use decision tree-based knot placement for spline transformations.

  • knots_strategy (str, default="uniform") – Strategy for placing knots for splines (“uniform” or “quantile”).

  • spline_implementation (str, default="sklearn") – Which spline backend implementation to use (e.g., “sklearn”, “custom”).

  • min_unique_vals (int, default=5) – Minimum number of unique values required for a feature to be treated as numerical.

Examples

>>> from deeptab.models import AutoIntLSS
>>> model = AutoIntLSS(d_model=64, n_layers=8)
>>> model.fit(X_train, y_train, family="normal")
>>> preds = model.predict(X_test)
>>> model.evaluate(X_test, y_test)
build_model(X, y, val_size=0.2, X_val=None, y_val=None, random_state=101, batch_size=128, shuffle=True, lr=None, lr_patience=None, lr_factor=None, weight_decay=None, train_metrics=None, val_metrics=None, dataloader_kwargs={})

Builds the model using the provided training data.

Parameters:
  • X (DataFrame or array-like, shape (n_samples, n_features)) – The training input samples.

  • y (array-like, shape (n_samples,) or (n_samples, n_targets)) – The target values (real numbers).

  • val_size (float) – The proportion of the dataset to include in the validation split if X_val is None. Ignored if X_val is provided.

  • X_val (DataFrame or array-like, shape (n_samples, n_features), optional) – The validation input samples. If provided, X and y are not split and this data is used for validation.

  • y_val (array-like, shape (n_samples,) or (n_samples, n_targets), optional) – The validation target values. Required if X_val is provided.

  • random_state (int) – Controls the shuffling applied to the data before applying the split.

  • batch_size (int) – Number of samples per gradient update.

  • shuffle (bool) – Whether to shuffle the training data before each epoch.

  • lr (float | None) – Learning rate for the optimizer.

  • lr_patience (int | None) – Number of epochs with no improvement on the validation loss to wait before reducing the learning rate.

  • lr_factor (float | None) – Factor by which the learning rate will be reduced.

  • train_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during training.

  • val_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during validation.

  • weight_decay (float | None) – Weight decay (L2 penalty) coefficient.

  • dataloader_kwargs (dict, default={}) – The kwargs for the pytorch dataloader class.

Returns:

self – The built distributional regressor.

Return type:

object

encode(X, batch_size=64)

Encodes input data using the trained model’s embedding layer.

Parameters:
  • X (array-like or DataFrame) – Input data to be encoded.

  • batch_size (int, optional, default=64) – Batch size for encoding.

Returns:

Encoded representations of the input data.

Return type:

torch.Tensor

Raises:

ValueError – If the model or data module is not fitted.

evaluate(X, y_true, metrics=None, distribution_family=None)

Evaluate the model on the given data using specified metrics.

Parameters:
  • X (array-like or pd.DataFrame of shape (n_samples, n_features)) – The input samples to predict.

  • y_true (array-like of shape (n_samples,)) – The true class labels against which to evaluate the predictions.

  • metrics (dict) – A dictionary where keys are metric names and values are tuples containing the metric function and a boolean indicating whether the metric requires probability scores (True) or class labels (False).

  • distribution_family (str, optional) – Specifies the distribution family the model is predicting for. If None, it will attempt to infer based on the model’s settings.

Returns:

scores – A dictionary with metric names as keys and their corresponding scores as values.

Return type:

dict

Notes

This method uses either the predict or predict_proba method depending on the metric requirements.

fit(X, y, family, val_size=0.2, X_val=None, y_val=None, max_epochs=100, random_state=101, batch_size=128, shuffle=True, patience=15, monitor='val_loss', mode='min', lr=None, lr_patience=None, lr_factor=None, weight_decay=None, checkpoint_path='model_checkpoints', distributional_kwargs=None, train_metrics=None, val_metrics=None, dataloader_kwargs={}, rebuild=True, **trainer_kwargs)

Trains the regression model using the provided training data. Optionally, a separate validation set can be used.

Parameters:
  • X (DataFrame or array-like, shape (n_samples, n_features)) – The training input samples.

  • y (array-like, shape (n_samples,) or (n_samples, n_targets)) – The target values (real numbers).

  • family (str) – The name of the distribution family to use for the loss function. Examples include ‘normal’ for regression tasks.

  • val_size (float) – The proportion of the dataset to include in the validation split if X_val is None. Ignored if X_val is provided.

  • X_val (DataFrame or array-like, shape (n_samples, n_features), optional) – The validation input samples. If provided, X and y are not split and this data is used for validation.

  • y_val (array-like, shape (n_samples,) or (n_samples, n_targets), optional) – The validation target values. Required if X_val is provided.

  • max_epochs (int) – Maximum number of epochs for training.

  • random_state (int) – Controls the shuffling applied to the data before applying the split.

  • batch_size (int) – Number of samples per gradient update.

  • shuffle (bool) – Whether to shuffle the training data before each epoch.

  • patience (int) – Number of epochs with no improvement on the validation loss to wait before early stopping.

  • monitor (str) – The metric to monitor for early stopping.

  • mode (str) – Whether the monitored metric should be minimized (min) or maximized (max).

  • lr (float | None) – Learning rate for the optimizer.

  • lr_patience (int | None) – Number of epochs with no improvement on the validation loss to wait before reducing the learning rate.

  • factor (float, default=0.1) – Factor by which the learning rate will be reduced.

  • weight_decay (float | None) – Weight decay (L2 penalty) coefficient.

  • distributional_kwargs (dict, default=None) – any arguments taht are specific for a certain distribution.

  • train_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during training.

  • val_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during validation.

  • checkpoint_path (str, default="model_checkpoints") – Path where the checkpoints are being saved.

  • dataloader_kwargs (dict, default={}) – The kwargs for the pytorch dataloader class.

  • **trainer_kwargs (Additional keyword arguments for PyTorch Lightning's Trainer class.)

Returns:

self – The fitted regressor.

Return type:

object

get_default_metrics(distribution_family)

Provides default metrics based on the distribution family.

Parameters:

distribution_family (str) – The distribution family for which to provide default metrics.

Returns:

metrics – A dictionary of default metric functions.

Return type:

dict

get_number_of_params(requires_grad=True)

Calculate the number of parameters in the model.

Parameters:

requires_grad (bool, optional) – If True, only count the parameters that require gradients (trainable parameters). If False, count all parameters. Default is True.

Returns:

The total number of parameters in the model.

Return type:

int

Raises:

ValueError – If the model has not been built prior to calling this method.

get_params(deep=True)

Get parameters for this estimator.

Parameters:

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

params – Parameter names mapped to their values.

Return type:

dict

classmethod load(path)

Load and return a fitted model from path.

Parameters:

path (str) – Path to a file previously written by save().

Returns:

A fully reconstructed, ready-to-predict estimator.

Return type:

estimator

optimize_hparams(X, y, X_val=None, y_val=None, time=100, max_epochs=200, prune_by_epoch=True, prune_epoch=5, fixed_params={'cat_encoding': 'int', 'head_layer_size_length': 0, 'head_skip_layer': False, 'head_skip_layers': False, 'pooling_method': 'avg', 'use_cls': False}, custom_search_space=None, **optimize_kwargs)

Optimizes hyperparameters using Bayesian optimization with optional pruning.

Parameters:
  • X (array-like) – Training data.

  • y (array-like) – Training labels.

  • X_val (array-like, optional) – Validation data and labels.

  • y_val (array-like, optional) – Validation data and labels.

  • time (int) – The number of optimization trials to run.

  • max_epochs (int) – Maximum number of epochs for training.

  • prune_by_epoch (bool) – Whether to prune based on a specific epoch (True) or the best validation loss (False).

  • prune_epoch (int) – The specific epoch to prune by when prune_by_epoch is True.

  • **optimize_kwargs (dict) – Additional keyword arguments passed to the fit method.

Returns:

best_hparams – Best hyperparameters found during optimization.

Return type:

list

predict(X, raw=False, device=None)

Predicts target values for the given input samples.

Parameters:

X (DataFrame or array-like, shape (n_samples, n_features)) – The input samples for which to predict target values.

Returns:

predictions – The predicted target values.

Return type:

ndarray, shape (n_samples,) or (n_samples, n_outputs)

save(path)

Save the fitted model to path.

Parameters:

path (str) – Destination file path (e.g. "model.pt").

Raises:

ValueError – If the model has not been fitted yet.

Return type:

None

score(X, y, metric='NLL')

Calculate the score of the model using the specified metric.

Parameters:
  • X (array-like or pd.DataFrame of shape (n_samples, n_features)) – The input samples to predict.

  • y (array-like of shape (n_samples,) or (n_samples, n_outputs)) – The true target values against which to evaluate the predictions.

  • metric (str, default="NLL") – So far, only negative log-likelihood is supported

Returns:

score – The score calculated using the specified metric.

Return type:

float

set_params(**parameters)

Set the parameters of this estimator.

Parameters:

**parameters (dict) – Estimator parameters.

Returns:

self – Estimator instance.

Return type:

object

class deeptab.models.ENODEClassifier(*args: Any, **kwargs: Any)[source]

Neural Oblivious Decision Ensemble (ENODE) Classifier. Slightly different with a MLP as a tabular task specific head. This class extends the SklearnBaseClassifier class and uses the ENODE model with the default ENODE configuration.

Notes

The parameters for this class include the attributes from the config dataclass as well as preprocessing arguments handled by the base class.

Configuration class for the Neural Oblivious Decision Ensemble (NODE) model.

Parameters:
  • num_layers (int, default=4) – Number of dense layers in the model.

  • layer_dim (int, default=128) – Dimensionality of each dense layer.

  • tree_dim (int, default=1) – Dimensionality of the output from each tree leaf.

  • depth (int, default=6) – Depth of each decision tree in the ensemble.

  • norm (str, default=None) – Type of normalization to use in the model.

  • head_layer_sizes (list, default=()) – Sizes of the layers in the model’s head.

  • head_dropout (float, default=0.5) – Dropout rate for the head layers.

  • head_skip_layers (bool, default=False) – Whether to skip layers in the head.

  • head_activation (callable, default=nn.SELU()) – Activation function for the head layers.

  • head_use_batch_norm (bool, default=False) – Whether to use batch normalization in the head layers.

  • feature_preprocessing (dict, optional) – Dictionary mapping feature names to specific preprocessing methods. Overrides global defaults.

  • n_bins (int, default=64) – Number of bins used for binning-based preprocessing (e.g., for discretizers or PLE).

  • numerical_preprocessing (str, default="ple") – Preprocessing method for numerical features (e.g., “standardization”, “minmax”, “ple”, “rbf”, etc.).

  • categorical_preprocessing (str, default="int") – Preprocessing method for categorical features (e.g., “int”, “ordinal”, “onehot”).

  • use_decision_tree_bins (bool, default=False) – Whether to use decision tree binning for numerical discretization.

  • binning_strategy (str, default="uniform") – Strategy for bin placement when not using tree-based methods. Options: “uniform”, “quantile”.

  • task (str, default="regression") – Problem type used to guide preprocessing (e.g., “regression” or “classification”).

  • cat_cutoff (float or int, default=0.03) – Threshold to determine whether integer-valued features are treated as categorical.

  • treat_all_integers_as_numerical (bool, default=False) – If True, treat all integer-typed columns as numerical regardless of cardinality.

  • degree (int, default=3) – Degree of polynomial or spline basis functions where applicable.

  • scaling_strategy (str, default="minmax") – Strategy for feature scaling (e.g., “standardization”, “minmax”, etc.).

  • n_knots (int, default=64) – Number of knots used in spline-based feature expansions.

  • use_decision_tree_knots (bool, default=True) – Whether to use decision tree-based knot placement for spline transformations.

  • knots_strategy (str, default="uniform") – Strategy for placing knots for splines (“uniform” or “quantile”).

  • spline_implementation (str, default="sklearn") – Which spline backend implementation to use (e.g., “sklearn”, “custom”).

  • min_unique_vals (int, default=5) – Minimum number of unique values required for a feature to be treated as numerical.

Examples

>>> from deeptab.models import ENODEClassifier
>>> model = ENODEClassifier()
>>> model.fit(X_train, y_train)
>>> preds = model.predict(X_test)
>>> model.evaluate(X_test, y_test)
build_model(X, y, val_size=0.2, X_val=None, y_val=None, embeddings=None, embeddings_val=None, random_state=101, batch_size=128, shuffle=True, lr=None, lr_patience=None, lr_factor=None, weight_decay=None, train_metrics=None, val_metrics=None, dataloader_kwargs={})

Builds the model using the provided training data.

Parameters:
  • X (DataFrame or array-like, shape (n_samples, n_features)) – The training input samples.

  • y (array-like, shape (n_samples,) or (n_samples, n_targets)) – The target values (real numbers).

  • val_size (float) – The proportion of the dataset to include in the validation split if X_val is None. Ignored if X_val is provided.

  • X_val (DataFrame or array-like, shape (n_samples, n_features), optional) – The validation input samples. If provided, X and y are not split and this data is used for validation.

  • y_val (array-like, shape (n_samples,) or (n_samples, n_targets), optional) – The validation target values. Required if X_val is provided.

  • random_state (int) – Controls the shuffling applied to the data before applying the split.

  • batch_size (int) – Number of samples per gradient update.

  • shuffle (bool) – Whether to shuffle the training data before each epoch.

  • lr (float | None) – Learning rate for the optimizer.

  • lr_patience (int | None) – Number of epochs with no improvement on the validation loss to wait before reducing the learning rate.

  • lr_factor (float | None) – Factor by which the learning rate will be reduced.

  • train_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during training.

  • val_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during validation.

  • weight_decay (float | None) – Weight decay (L2 penalty) coefficient.

  • dataloader_kwargs (dict, default={}) – The kwargs for the pytorch dataloader class.

Returns:

self – The built classifier.

Return type:

object

encode(X, embeddings=None, batch_size=64)

Encodes input data using the trained model’s embedding layer.

Parameters:
  • X (array-like or DataFrame) – Input data to be encoded.

  • batch_size (int, optional, default=64) – Batch size for encoding.

Returns:

Encoded representations of the input data.

Return type:

torch.Tensor

Raises:

ValueError – If the model or data module is not fitted.

evaluate(X, y_true, embeddings=None, metrics=None)

Evaluate the model on the given data using specified metrics.

Parameters:
  • X (array-like or pd.DataFrame of shape (n_samples, n_features)) – The input samples to predict.

  • y_true (array-like of shape (n_samples,)) – The true class labels against which to evaluate the predictions.

  • embneddings (array-like or list of shape(n_samples, dimension)) – List or array with embeddings for unstructured data inputs

  • metrics (dict) – A dictionary where keys are metric names and values are tuples containing the metric function and a boolean indicating whether the metric requires probability scores (True) or class labels (False).

Returns:

scores – A dictionary with metric names as keys and their corresponding scores as values.

Return type:

dict

Notes

This method uses either the predict or predict_proba method depending on the metric requirements.

fit(X, y, val_size=0.2, X_val=None, y_val=None, embeddings=None, embeddings_val=None, max_epochs=100, random_state=101, batch_size=128, shuffle=True, patience=15, monitor='val_loss', mode='min', lr=None, lr_patience=None, lr_factor=None, weight_decay=None, checkpoint_path='model_checkpoints', train_metrics=None, val_metrics=None, dataloader_kwargs={}, rebuild=True, **trainer_kwargs)

Trains the classification model using the provided training data. Optionally, a separate validation set can be used.

Parameters:
  • X (DataFrame or array-like, shape (n_samples, n_features)) – The training input samples.

  • y (array-like, shape (n_samples,) or (n_samples, n_targets)) – The target values (real numbers).

  • val_size (float) – The proportion of the dataset to include in the validation split if X_val is None. Ignored if X_val is provided.

  • X_val (DataFrame or array-like, shape (n_samples, n_features), optional) – The validation input samples. If provided, X and y are not split and this data is used for validation.

  • y_val (array-like, shape (n_samples,) or (n_samples, n_targets), optional) – The validation target values. Required if X_val is provided.

  • max_epochs (int) – Maximum number of epochs for training.

  • random_state (int) – Controls the shuffling applied to the data before applying the split.

  • batch_size (int) – Number of samples per gradient update.

  • shuffle (bool) – Whether to shuffle the training data before each epoch.

  • patience (int) – Number of epochs with no improvement on the validation loss to wait before early stopping.

  • monitor (str) – The metric to monitor for early stopping.

  • mode (str) – Whether the monitored metric should be minimized (min) or maximized (max).

  • lr (float | None) – Learning rate for the optimizer.

  • lr_patience (int | None) – Number of epochs with no improvement on the validation loss to wait before reducing the learning rate.

  • factor (float, default=0.1) – Factor by which the learning rate will be reduced.

  • weight_decay (float | None) – Weight decay (L2 penalty) coefficient.

  • checkpoint_path (str, default="model_checkpoints") – Path where the checkpoints are being saved.

  • train_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during training.

  • val_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during validation.

  • dataloader_kwargs (dict, default={}) – The kwargs for the pytorch dataloader class.

  • rebuild (bool, default=True) – Whether to rebuild the model when it already was built.

  • **trainer_kwargs (Additional keyword arguments for PyTorch Lightning's Trainer class.)

Returns:

self – The fitted classifier.

Return type:

object

get_number_of_params(requires_grad=True)

Calculate the number of parameters in the model.

Parameters:

requires_grad (bool, optional) – If True, only count the parameters that require gradients (trainable parameters). If False, count all parameters. Default is True.

Returns:

The total number of parameters in the model.

Return type:

int

Raises:

ValueError – If the model has not been built prior to calling this method.

get_params(deep=True)

Get parameters for this estimator.

classmethod load(path)

Load and return a fitted model from path.

Parameters:

path (str) – Path to a file previously written by save().

Returns:

A fully reconstructed, ready-to-predict estimator of the same type that was saved.

Return type:

estimator

optimize_hparams(X, y, X_val=None, y_val=None, embeddings=None, embeddings_val=None, time=100, max_epochs=200, prune_by_epoch=True, prune_epoch=5, fixed_params={'cat_encoding': 'int', 'head_layer_size_length': 0, 'head_skip_layer': False, 'head_skip_layers': False, 'pooling_method': 'avg', 'use_cls': False}, custom_search_space=None, **optimize_kwargs)

Optimizes hyperparameters using Bayesian optimization with optional pruning.

Parameters:
  • X (array-like) – Training data.

  • y (array-like) – Training labels.

  • X_val (array-like, optional) – Validation data and labels.

  • y_val (array-like, optional) – Validation data and labels.

  • time (int) – The number of optimization trials to run.

  • max_epochs (int) – Maximum number of epochs for training.

  • prune_by_epoch (bool) – Whether to prune based on a specific epoch (True) or the best validation loss (False).

  • prune_epoch (int) – The specific epoch to prune by when prune_by_epoch is True.

  • **optimize_kwargs (dict) – Additional keyword arguments passed to the fit method.

Returns:

best_hparams – Best hyperparameters found during optimization.

Return type:

list

predict(X, embeddings=None, device=None)

Predicts target labels for the given input samples.

Parameters:

X (DataFrame or array-like, shape (n_samples, n_features)) – The input samples for which to predict target values.

Returns:

predictions – The predicted class labels.

Return type:

ndarray, shape (n_samples,)

predict_proba(X, embeddings=None, device=None)

Predicts class probabilities for the given input samples.

Parameters:

X (DataFrame or array-like, shape (n_samples, n_features)) – The input samples for which to predict class probabilities.

Returns:

probabilities – The predicted class probabilities.

Return type:

ndarray, shape (n_samples, n_classes)

pretrain(pretrain_epochs=15, k_neighbors=10, temperature=0.1, save_path='pretrained_embeddings.pth', lr=0.001, use_positive=True, use_negative=False, pool_sequence=True)

Pretrains the embedding layer of the model using a contrastive learning approach.

This method performs pretraining by optimizing the embeddings with respect to neighborhood structure in the feature space. The embeddings are saved after training.

Parameters:
  • pretrain_epochs (int, default=15) – Number of epochs to run pretraining.

  • k_neighbors (int, default=10) – Number of neighbors used in the contrastive loss computation.

  • temperature (float, default=0.1) – Temperature parameter for contrastive loss scaling.

  • save_path (str, default="pretrained_embeddings.pth") – Path to save the pretrained embeddings.

  • lr (float, default=1e-3) – Learning rate for the pretraining optimizer.

  • use_positive (bool, default=True) – Whether to include positive pairs in contrastive learning.

  • use_negative (bool, default=False) – Whether to include negative pairs in contrastive learning.

  • pool_sequence (bool, default=True) – Whether to apply sequence pooling before computing contrastive loss.

Raises:
  • ValueError – If the model has not been built before calling this method.

  • ValueError – If the model does not contain an embedding layer.

Notes

  • This function requires that self.build_model() has been called beforehand.

  • The pretraining method uses self.task_model.estimator.embedding_layer.

  • The method invokes super()._pretrain() with regression mode enabled.

save(path)

Save the fitted model to path.

The bundle written by this method can be restored with load(). It contains all state required for inference: the config, the fitted preprocessor, feature metadata, and the neural-network weights.

Parameters:

path (str) – Destination file path (e.g. "model.pt").

Raises:

ValueError – If the model has not been fitted yet.

Return type:

None

score(X, y, embeddings=None, metric=(sklearn.metrics.log_loss, True))

Calculate the score of the model using the specified metric.

Parameters:
  • X (array-like or pd.DataFrame of shape (n_samples, n_features)) – The input samples to predict.

  • y (array-like of shape (n_samples,)) – The true class labels against which to evaluate the predictions.

  • metric (tuple, default=(log_loss, True)) – A tuple containing the metric function and a boolean indicating whether the metric requires probability scores (True) or class labels (False).

Returns:

score – The score calculated using the specified metric.

Return type:

float

set_params(**parameters)

Set the parameters of this estimator.

class deeptab.models.ENODERegressor(*args: Any, **kwargs: Any)[source]

Neural Oblivious Decision Ensemble (ENODE) Regressor. Slightly different with a MLP as a tabular task specific head. This class extends the SklearnBaseRegressor class and uses the ENODE model with the default ENODE configuration.

Notes

The parameters for this class include the attributes from the config dataclass as well as preprocessing arguments handled by the base class.

Configuration class for the Neural Oblivious Decision Ensemble (NODE) model.

Parameters:
  • num_layers (int, default=4) – Number of dense layers in the model.

  • layer_dim (int, default=128) – Dimensionality of each dense layer.

  • tree_dim (int, default=1) – Dimensionality of the output from each tree leaf.

  • depth (int, default=6) – Depth of each decision tree in the ensemble.

  • norm (str, default=None) – Type of normalization to use in the model.

  • head_layer_sizes (list, default=()) – Sizes of the layers in the model’s head.

  • head_dropout (float, default=0.5) – Dropout rate for the head layers.

  • head_skip_layers (bool, default=False) – Whether to skip layers in the head.

  • head_activation (callable, default=nn.SELU()) – Activation function for the head layers.

  • head_use_batch_norm (bool, default=False) – Whether to use batch normalization in the head layers.

  • feature_preprocessing (dict, optional) – Dictionary mapping feature names to specific preprocessing methods. Overrides global defaults.

  • n_bins (int, default=64) – Number of bins used for binning-based preprocessing (e.g., for discretizers or PLE).

  • numerical_preprocessing (str, default="ple") – Preprocessing method for numerical features (e.g., “standardization”, “minmax”, “ple”, “rbf”, etc.).

  • categorical_preprocessing (str, default="int") – Preprocessing method for categorical features (e.g., “int”, “ordinal”, “onehot”).

  • use_decision_tree_bins (bool, default=False) – Whether to use decision tree binning for numerical discretization.

  • binning_strategy (str, default="uniform") – Strategy for bin placement when not using tree-based methods. Options: “uniform”, “quantile”.

  • task (str, default="regression") – Problem type used to guide preprocessing (e.g., “regression” or “classification”).

  • cat_cutoff (float or int, default=0.03) – Threshold to determine whether integer-valued features are treated as categorical.

  • treat_all_integers_as_numerical (bool, default=False) – If True, treat all integer-typed columns as numerical regardless of cardinality.

  • degree (int, default=3) – Degree of polynomial or spline basis functions where applicable.

  • scaling_strategy (str, default="minmax") – Strategy for feature scaling (e.g., “standardization”, “minmax”, etc.).

  • n_knots (int, default=64) – Number of knots used in spline-based feature expansions.

  • use_decision_tree_knots (bool, default=True) – Whether to use decision tree-based knot placement for spline transformations.

  • knots_strategy (str, default="uniform") – Strategy for placing knots for splines (“uniform” or “quantile”).

  • spline_implementation (str, default="sklearn") – Which spline backend implementation to use (e.g., “sklearn”, “custom”).

  • min_unique_vals (int, default=5) – Minimum number of unique values required for a feature to be treated as numerical.

Examples

>>> from deeptab.models import ENODERegressor
>>> model = ENODERegressor()
>>> model.fit(X_train, y_train)
>>> preds = model.predict(X_test)
>>> model.evaluate(X_test, y_test)
build_model(X, y, val_size=0.2, X_val=None, y_val=None, embeddings=None, embeddings_val=None, random_state=101, batch_size=128, shuffle=True, lr=None, lr_patience=None, lr_factor=None, weight_decay=None, train_metrics=None, val_metrics=None, dataloader_kwargs={})

Builds the model using the provided training data.

Parameters:
  • X (DataFrame or array-like, shape (n_samples, n_features)) – The training input samples.

  • y (array-like, shape (n_samples,) or (n_samples, n_targets)) – The target values (real numbers).

  • val_size (float) – The proportion of the dataset to include in the validation split if X_val is None. Ignored if X_val is provided.

  • X_val (DataFrame or array-like, shape (n_samples, n_features), optional) – The validation input samples. If provided, X and y are not split and this data is used for validation.

  • y_val (array-like, shape (n_samples,) or (n_samples, n_targets), optional) – The validation target values. Required if X_val is provided.

  • random_state (int) – Controls the shuffling applied to the data before applying the split.

  • batch_size (int) – Number of samples per gradient update.

  • shuffle (bool) – Whether to shuffle the training data before each epoch.

  • lr (float | None) – Learning rate for the optimizer.

  • lr_patience (int | None) – Number of epochs with no improvement on the validation loss to wait before reducing the learning rate.

  • factor (float, default=0.1) – Factor by which the learning rate will be reduced.

  • weight_decay (float | None) – Weight decay (L2 penalty) coefficient.

  • train_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during training.

  • val_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during validation.

  • dataloader_kwargs (dict, default={}) – The kwargs for the pytorch dataloader class.

Returns:

self – The built regressor.

Return type:

object

encode(X, embeddings=None, batch_size=64)

Encodes input data using the trained model’s embedding layer.

Parameters:
  • X (array-like or DataFrame) – Input data to be encoded.

  • batch_size (int, optional, default=64) – Batch size for encoding.

Returns:

Encoded representations of the input data.

Return type:

torch.Tensor

Raises:

ValueError – If the model or data module is not fitted.

evaluate(X, y_true, embeddings=None, metrics=None)

Evaluate the model on the given data using specified metrics.

Parameters:
  • X (array-like or pd.DataFrame of shape (n_samples, n_features)) – The input samples to predict.

  • y_true (array-like of shape (n_samples,) or (n_samples, n_outputs)) – The true target values against which to evaluate the predictions.

  • metrics (dict) – A dictionary where keys are metric names and values are the metric functions.

Notes

This method uses the predict method to generate predictions and computes each metric.

Returns:

scores – A dictionary with metric names as keys and their corresponding scores as values.

Return type:

dict

fit(X, y, val_size=0.2, X_val=None, y_val=None, embeddings=None, embeddings_val=None, max_epochs=100, random_state=101, batch_size=128, shuffle=True, patience=15, monitor='val_loss', mode='min', lr=None, lr_patience=None, lr_factor=None, weight_decay=None, checkpoint_path='model_checkpoints', dataloader_kwargs={}, train_metrics=None, val_metrics=None, rebuild=True, **trainer_kwargs)

Trains the regression model using the provided training data. Optionally, a separate validation set can be used.

Parameters:
  • X (DataFrame or array-like, shape (n_samples, n_features)) – The training input samples.

  • y (array-like, shape (n_samples,) or (n_samples, n_targets)) – The target values (real numbers).

  • val_size (float) – The proportion of the dataset to include in the validation split if X_val is None. Ignored if X_val is provided.

  • X_val (DataFrame or array-like, shape (n_samples, n_features), optional) – The validation input samples. If provided, X and y are not split and this data is used for validation.

  • y_val (array-like, shape (n_samples,) or (n_samples, n_targets), optional) – The validation target values. Required if X_val is provided.

  • max_epochs (int) – Maximum number of epochs for training.

  • random_state (int) – Controls the shuffling applied to the data before applying the split.

  • batch_size (int) – Number of samples per gradient update.

  • shuffle (bool) – Whether to shuffle the training data before each epoch.

  • patience (int) – Number of epochs with no improvement on the validation loss to wait before early stopping.

  • monitor (str) – The metric to monitor for early stopping.

  • mode (str) – Whether the monitored metric should be minimized (min) or maximized (max).

  • lr (float | None) – Learning rate for the optimizer.

  • lr_patience (int | None) – Number of epochs with no improvement on the validation loss to wait before reducing the learning rate.

  • factor (float, default=0.1) – Factor by which the learning rate will be reduced.

  • weight_decay (float | None) – Weight decay (L2 penalty) coefficient.

  • checkpoint_path (str, default="model_checkpoints") – Path where the checkpoints are being saved.

  • dataloader_kwargs (dict, default={}) – The kwargs for the pytorch dataloader class.

  • train_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during training.

  • val_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during validation.

  • rebuild (bool, default=True) – Whether to rebuild the model when it already was built.

  • **trainer_kwargs (Additional keyword arguments for PyTorch Lightning's Trainer class.)

Returns:

self – The fitted regressor.

Return type:

object

get_number_of_params(requires_grad=True)

Calculate the number of parameters in the model.

Parameters:

requires_grad (bool, optional) – If True, only count the parameters that require gradients (trainable parameters). If False, count all parameters. Default is True.

Returns:

The total number of parameters in the model.

Return type:

int

Raises:

ValueError – If the model has not been built prior to calling this method.

get_params(deep=True)

Get parameters for this estimator.

classmethod load(path)

Load and return a fitted model from path.

Parameters:

path (str) – Path to a file previously written by save().

Returns:

A fully reconstructed, ready-to-predict estimator of the same type that was saved.

Return type:

estimator

optimize_hparams(X, y, X_val=None, y_val=None, embeddings=None, embeddings_val=None, time=100, max_epochs=200, prune_by_epoch=True, prune_epoch=5, fixed_params={'cat_encoding': 'int', 'head_layer_size_length': 0, 'head_skip_layer': False, 'head_skip_layers': False, 'pooling_method': 'avg', 'use_cls': False}, custom_search_space=None, **optimize_kwargs)

Optimizes hyperparameters using Bayesian optimization with optional pruning.

Parameters:
  • X (array-like) – Training data.

  • y (array-like) – Training labels.

  • X_val (array-like, optional) – Validation data and labels.

  • y_val (array-like, optional) – Validation data and labels.

  • time (int) – The number of optimization trials to run.

  • max_epochs (int) – Maximum number of epochs for training.

  • prune_by_epoch (bool) – Whether to prune based on a specific epoch (True) or the best validation loss (False).

  • prune_epoch (int) – The specific epoch to prune by when prune_by_epoch is True.

  • **optimize_kwargs (dict) – Additional keyword arguments passed to the fit method.

Returns:

best_hparams – Best hyperparameters found during optimization.

Return type:

list

predict(X, embeddings=None, device=None)

Predicts target values for the given input samples.

Parameters:

X (DataFrame or array-like, shape (n_samples, n_features)) – The input samples for which to predict target values.

Returns:

predictions – The predicted target values.

Return type:

ndarray, shape (n_samples,) or (n_samples, n_outputs)

pretrain(pretrain_epochs=15, k_neighbors=10, temperature=0.1, save_path='pretrained_embeddings.pth', lr=0.001, use_positive=True, use_negative=False, pool_sequence=True)

Pretrains the embedding layer of the model using a contrastive learning approach.

This method performs pretraining by optimizing the embeddings with respect to neighborhood structure in the feature space. The embeddings are saved after training.

Parameters:
  • pretrain_epochs (int, default=15) – Number of epochs to run pretraining.

  • k_neighbors (int, default=10) – Number of neighbors used in the contrastive loss computation.

  • temperature (float, default=0.1) – Temperature parameter for contrastive loss scaling.

  • save_path (str, default="pretrained_embeddings.pth") – Path to save the pretrained embeddings.

  • lr (float, default=1e-3) – Learning rate for the pretraining optimizer.

  • use_positive (bool, default=True) – Whether to include positive pairs in contrastive learning.

  • use_negative (bool, default=False) – Whether to include negative pairs in contrastive learning.

  • pool_sequence (bool, default=True) – Whether to apply sequence pooling before computing contrastive loss.

Raises:
  • ValueError – If the model has not been built before calling this method.

  • ValueError – If the model does not contain an embedding layer.

Notes

  • This function requires that self.build_model() has been called beforehand.

  • The pretraining method uses self.task_model.estimator.embedding_layer.

  • The method invokes super()._pretrain() with regression mode enabled.

save(path)

Save the fitted model to path.

The bundle written by this method can be restored with load(). It contains all state required for inference: the config, the fitted preprocessor, feature metadata, and the neural-network weights.

Parameters:

path (str) – Destination file path (e.g. "model.pt").

Raises:

ValueError – If the model has not been fitted yet.

Return type:

None

score(X, y, embeddings=None, metric=sklearn.metrics.mean_squared_error)

Calculate the score of the model using the specified metric.

Parameters:
  • X (array-like or pd.DataFrame of shape (n_samples, n_features)) – The input samples to predict.

  • y (array-like of shape (n_samples,) or (n_samples, n_outputs)) – The true target values against which to evaluate the predictions.

  • metric (callable, default=mean_squared_error) – The metric function to use for evaluation. Must be a callable with the signature metric(y_true, y_pred).

Returns:

score – The score calculated using the specified metric.

Return type:

float

set_params(**parameters)

Set the parameters of this estimator.

class deeptab.models.ENODELSS(*args: Any, **kwargs: Any)[source]

Neural Oblivious Decision Ensemble (ENODE) for distributional regression. Slightly different with a MLP as a tabular task specific head. This class extends the SklearnBaseLSS class and uses the ENODE model with the default ENODE configuration.

Notes

The parameters for this class include the attributes from the config dataclass as well as preprocessing arguments handled by the base class.

Configuration class for the Neural Oblivious Decision Ensemble (NODE) model.

Parameters:
  • num_layers (int, default=4) – Number of dense layers in the model.

  • layer_dim (int, default=128) – Dimensionality of each dense layer.

  • tree_dim (int, default=1) – Dimensionality of the output from each tree leaf.

  • depth (int, default=6) – Depth of each decision tree in the ensemble.

  • norm (str, default=None) – Type of normalization to use in the model.

  • head_layer_sizes (list, default=()) – Sizes of the layers in the model’s head.

  • head_dropout (float, default=0.5) – Dropout rate for the head layers.

  • head_skip_layers (bool, default=False) – Whether to skip layers in the head.

  • head_activation (callable, default=nn.SELU()) – Activation function for the head layers.

  • head_use_batch_norm (bool, default=False) – Whether to use batch normalization in the head layers.

  • feature_preprocessing (dict, optional) – Dictionary mapping feature names to specific preprocessing methods. Overrides global defaults.

  • n_bins (int, default=64) – Number of bins used for binning-based preprocessing (e.g., for discretizers or PLE).

  • numerical_preprocessing (str, default="ple") – Preprocessing method for numerical features (e.g., “standardization”, “minmax”, “ple”, “rbf”, etc.).

  • categorical_preprocessing (str, default="int") – Preprocessing method for categorical features (e.g., “int”, “ordinal”, “onehot”).

  • use_decision_tree_bins (bool, default=False) – Whether to use decision tree binning for numerical discretization.

  • binning_strategy (str, default="uniform") – Strategy for bin placement when not using tree-based methods. Options: “uniform”, “quantile”.

  • task (str, default="regression") – Problem type used to guide preprocessing (e.g., “regression” or “classification”).

  • cat_cutoff (float or int, default=0.03) – Threshold to determine whether integer-valued features are treated as categorical.

  • treat_all_integers_as_numerical (bool, default=False) – If True, treat all integer-typed columns as numerical regardless of cardinality.

  • degree (int, default=3) – Degree of polynomial or spline basis functions where applicable.

  • scaling_strategy (str, default="minmax") – Strategy for feature scaling (e.g., “standardization”, “minmax”, etc.).

  • n_knots (int, default=64) – Number of knots used in spline-based feature expansions.

  • use_decision_tree_knots (bool, default=True) – Whether to use decision tree-based knot placement for spline transformations.

  • knots_strategy (str, default="uniform") – Strategy for placing knots for splines (“uniform” or “quantile”).

  • spline_implementation (str, default="sklearn") – Which spline backend implementation to use (e.g., “sklearn”, “custom”).

  • min_unique_vals (int, default=5) – Minimum number of unique values required for a feature to be treated as numerical.

Examples

>>> from deeptab.models import ENODELSS
>>> model = ENODELSS()
>>> model.fit(X_train, y_train, family='normal')
>>> preds = model.predict(X_test)
>>> model.evaluate(X_test, y_test)
build_model(X, y, val_size=0.2, X_val=None, y_val=None, random_state=101, batch_size=128, shuffle=True, lr=None, lr_patience=None, lr_factor=None, weight_decay=None, train_metrics=None, val_metrics=None, dataloader_kwargs={})

Builds the model using the provided training data.

Parameters:
  • X (DataFrame or array-like, shape (n_samples, n_features)) – The training input samples.

  • y (array-like, shape (n_samples,) or (n_samples, n_targets)) – The target values (real numbers).

  • val_size (float) – The proportion of the dataset to include in the validation split if X_val is None. Ignored if X_val is provided.

  • X_val (DataFrame or array-like, shape (n_samples, n_features), optional) – The validation input samples. If provided, X and y are not split and this data is used for validation.

  • y_val (array-like, shape (n_samples,) or (n_samples, n_targets), optional) – The validation target values. Required if X_val is provided.

  • random_state (int) – Controls the shuffling applied to the data before applying the split.

  • batch_size (int) – Number of samples per gradient update.

  • shuffle (bool) – Whether to shuffle the training data before each epoch.

  • lr (float | None) – Learning rate for the optimizer.

  • lr_patience (int | None) – Number of epochs with no improvement on the validation loss to wait before reducing the learning rate.

  • lr_factor (float | None) – Factor by which the learning rate will be reduced.

  • train_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during training.

  • val_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during validation.

  • weight_decay (float | None) – Weight decay (L2 penalty) coefficient.

  • dataloader_kwargs (dict, default={}) – The kwargs for the pytorch dataloader class.

Returns:

self – The built distributional regressor.

Return type:

object

encode(X, batch_size=64)

Encodes input data using the trained model’s embedding layer.

Parameters:
  • X (array-like or DataFrame) – Input data to be encoded.

  • batch_size (int, optional, default=64) – Batch size for encoding.

Returns:

Encoded representations of the input data.

Return type:

torch.Tensor

Raises:

ValueError – If the model or data module is not fitted.

evaluate(X, y_true, metrics=None, distribution_family=None)

Evaluate the model on the given data using specified metrics.

Parameters:
  • X (array-like or pd.DataFrame of shape (n_samples, n_features)) – The input samples to predict.

  • y_true (array-like of shape (n_samples,)) – The true class labels against which to evaluate the predictions.

  • metrics (dict) – A dictionary where keys are metric names and values are tuples containing the metric function and a boolean indicating whether the metric requires probability scores (True) or class labels (False).

  • distribution_family (str, optional) – Specifies the distribution family the model is predicting for. If None, it will attempt to infer based on the model’s settings.

Returns:

scores – A dictionary with metric names as keys and their corresponding scores as values.

Return type:

dict

Notes

This method uses either the predict or predict_proba method depending on the metric requirements.

fit(X, y, family, val_size=0.2, X_val=None, y_val=None, max_epochs=100, random_state=101, batch_size=128, shuffle=True, patience=15, monitor='val_loss', mode='min', lr=None, lr_patience=None, lr_factor=None, weight_decay=None, checkpoint_path='model_checkpoints', distributional_kwargs=None, train_metrics=None, val_metrics=None, dataloader_kwargs={}, rebuild=True, **trainer_kwargs)

Trains the regression model using the provided training data. Optionally, a separate validation set can be used.

Parameters:
  • X (DataFrame or array-like, shape (n_samples, n_features)) – The training input samples.

  • y (array-like, shape (n_samples,) or (n_samples, n_targets)) – The target values (real numbers).

  • family (str) – The name of the distribution family to use for the loss function. Examples include ‘normal’ for regression tasks.

  • val_size (float) – The proportion of the dataset to include in the validation split if X_val is None. Ignored if X_val is provided.

  • X_val (DataFrame or array-like, shape (n_samples, n_features), optional) – The validation input samples. If provided, X and y are not split and this data is used for validation.

  • y_val (array-like, shape (n_samples,) or (n_samples, n_targets), optional) – The validation target values. Required if X_val is provided.

  • max_epochs (int) – Maximum number of epochs for training.

  • random_state (int) – Controls the shuffling applied to the data before applying the split.

  • batch_size (int) – Number of samples per gradient update.

  • shuffle (bool) – Whether to shuffle the training data before each epoch.

  • patience (int) – Number of epochs with no improvement on the validation loss to wait before early stopping.

  • monitor (str) – The metric to monitor for early stopping.

  • mode (str) – Whether the monitored metric should be minimized (min) or maximized (max).

  • lr (float | None) – Learning rate for the optimizer.

  • lr_patience (int | None) – Number of epochs with no improvement on the validation loss to wait before reducing the learning rate.

  • factor (float, default=0.1) – Factor by which the learning rate will be reduced.

  • weight_decay (float | None) – Weight decay (L2 penalty) coefficient.

  • distributional_kwargs (dict, default=None) – any arguments taht are specific for a certain distribution.

  • train_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during training.

  • val_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during validation.

  • checkpoint_path (str, default="model_checkpoints") – Path where the checkpoints are being saved.

  • dataloader_kwargs (dict, default={}) – The kwargs for the pytorch dataloader class.

  • **trainer_kwargs (Additional keyword arguments for PyTorch Lightning's Trainer class.)

Returns:

self – The fitted regressor.

Return type:

object

get_default_metrics(distribution_family)

Provides default metrics based on the distribution family.

Parameters:

distribution_family (str) – The distribution family for which to provide default metrics.

Returns:

metrics – A dictionary of default metric functions.

Return type:

dict

get_number_of_params(requires_grad=True)

Calculate the number of parameters in the model.

Parameters:

requires_grad (bool, optional) – If True, only count the parameters that require gradients (trainable parameters). If False, count all parameters. Default is True.

Returns:

The total number of parameters in the model.

Return type:

int

Raises:

ValueError – If the model has not been built prior to calling this method.

get_params(deep=True)

Get parameters for this estimator.

Parameters:

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

params – Parameter names mapped to their values.

Return type:

dict

classmethod load(path)

Load and return a fitted model from path.

Parameters:

path (str) – Path to a file previously written by save().

Returns:

A fully reconstructed, ready-to-predict estimator.

Return type:

estimator

optimize_hparams(X, y, X_val=None, y_val=None, time=100, max_epochs=200, prune_by_epoch=True, prune_epoch=5, fixed_params={'cat_encoding': 'int', 'head_layer_size_length': 0, 'head_skip_layer': False, 'head_skip_layers': False, 'pooling_method': 'avg', 'use_cls': False}, custom_search_space=None, **optimize_kwargs)

Optimizes hyperparameters using Bayesian optimization with optional pruning.

Parameters:
  • X (array-like) – Training data.

  • y (array-like) – Training labels.

  • X_val (array-like, optional) – Validation data and labels.

  • y_val (array-like, optional) – Validation data and labels.

  • time (int) – The number of optimization trials to run.

  • max_epochs (int) – Maximum number of epochs for training.

  • prune_by_epoch (bool) – Whether to prune based on a specific epoch (True) or the best validation loss (False).

  • prune_epoch (int) – The specific epoch to prune by when prune_by_epoch is True.

  • **optimize_kwargs (dict) – Additional keyword arguments passed to the fit method.

Returns:

best_hparams – Best hyperparameters found during optimization.

Return type:

list

predict(X, raw=False, device=None)

Predicts target values for the given input samples.

Parameters:

X (DataFrame or array-like, shape (n_samples, n_features)) – The input samples for which to predict target values.

Returns:

predictions – The predicted target values.

Return type:

ndarray, shape (n_samples,) or (n_samples, n_outputs)

save(path)

Save the fitted model to path.

Parameters:

path (str) – Destination file path (e.g. "model.pt").

Raises:

ValueError – If the model has not been fitted yet.

Return type:

None

score(X, y, metric='NLL')

Calculate the score of the model using the specified metric.

Parameters:
  • X (array-like or pd.DataFrame of shape (n_samples, n_features)) – The input samples to predict.

  • y (array-like of shape (n_samples,) or (n_samples, n_outputs)) – The true target values against which to evaluate the predictions.

  • metric (str, default="NLL") – So far, only negative log-likelihood is supported

Returns:

score – The score calculated using the specified metric.

Return type:

float

set_params(**parameters)

Set the parameters of this estimator.

Parameters:

**parameters (dict) – Estimator parameters.

Returns:

self – Estimator instance.

Return type:

object

Experimental Models

Warning

The classes below live in deeptab.models.experimental. Their API may change without a deprecation cycle. Import them explicitly:

from deeptab.models.experimental import ModernNCAClassifier
class deeptab.models.experimental.ModernNCAClassifier(*args: Any, **kwargs: Any)[source]

Multi-Layer Perceptron classifier This class extends the SklearnBaseClassifier class and uses the ModernNCA model with the default ModernNCA configuration.

Notes

The parameters for this class include the attributes from the config dataclass as well as preprocessing arguments handled by the base class.

Default configuration for the ModernNCA model.

feature_preprocessingdict, optional

Dictionary mapping feature names to specific preprocessing methods. Overrides global defaults.

n_binsint, default=64

Number of bins used for binning-based preprocessing (e.g., for discretizers or PLE).

numerical_preprocessingstr, default=”ple”

Preprocessing method for numerical features (e.g., “standardization”, “minmax”, “ple”, “rbf”, etc.).

categorical_preprocessingstr, default=”int”

Preprocessing method for categorical features (e.g., “int”, “ordinal”, “onehot”).

use_decision_tree_binsbool, default=False

Whether to use decision tree binning for numerical discretization.

binning_strategystr, default=”uniform”

Strategy for bin placement when not using tree-based methods. Options: “uniform”, “quantile”.

taskstr, default=”regression”

Problem type used to guide preprocessing (e.g., “regression” or “classification”).

cat_cutofffloat or int, default=0.03

Threshold to determine whether integer-valued features are treated as categorical.

treat_all_integers_as_numericalbool, default=False

If True, treat all integer-typed columns as numerical regardless of cardinality.

degreeint, default=3

Degree of polynomial or spline basis functions where applicable.

scaling_strategystr, default=”minmax”

Strategy for feature scaling (e.g., “standardization”, “minmax”, etc.).

n_knotsint, default=64

Number of knots used in spline-based feature expansions.

use_decision_tree_knotsbool, default=True

Whether to use decision tree-based knot placement for spline transformations.

knots_strategystr, default=”uniform”

Strategy for placing knots for splines (“uniform” or “quantile”).

spline_implementationstr, default=”sklearn”

Which spline backend implementation to use (e.g., “sklearn”, “custom”).

min_unique_valsint, default=5

Minimum number of unique values required for a feature to be treated as numerical.

Examples

>>> from deeptab.models.experimental import ModernNCAClassifier
>>> model = ModernNCAClassifier(d_model=64, n_layers=8)
>>> model.fit(X_train, y_train)
>>> preds = model.predict(X_test)
>>> model.evaluate(X_test, y_test)
build_model(X, y, val_size=0.2, X_val=None, y_val=None, embeddings=None, embeddings_val=None, random_state=101, batch_size=128, shuffle=True, lr=None, lr_patience=None, lr_factor=None, weight_decay=None, train_metrics=None, val_metrics=None, dataloader_kwargs={})

Builds the model using the provided training data.

Parameters:
  • X (DataFrame or array-like, shape (n_samples, n_features)) – The training input samples.

  • y (array-like, shape (n_samples,) or (n_samples, n_targets)) – The target values (real numbers).

  • val_size (float) – The proportion of the dataset to include in the validation split if X_val is None. Ignored if X_val is provided.

  • X_val (DataFrame or array-like, shape (n_samples, n_features), optional) – The validation input samples. If provided, X and y are not split and this data is used for validation.

  • y_val (array-like, shape (n_samples,) or (n_samples, n_targets), optional) – The validation target values. Required if X_val is provided.

  • random_state (int) – Controls the shuffling applied to the data before applying the split.

  • batch_size (int) – Number of samples per gradient update.

  • shuffle (bool) – Whether to shuffle the training data before each epoch.

  • lr (float | None) – Learning rate for the optimizer.

  • lr_patience (int | None) – Number of epochs with no improvement on the validation loss to wait before reducing the learning rate.

  • lr_factor (float | None) – Factor by which the learning rate will be reduced.

  • train_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during training.

  • val_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during validation.

  • weight_decay (float | None) – Weight decay (L2 penalty) coefficient.

  • dataloader_kwargs (dict, default={}) – The kwargs for the pytorch dataloader class.

Returns:

self – The built classifier.

Return type:

object

encode(X, embeddings=None, batch_size=64)

Encodes input data using the trained model’s embedding layer.

Parameters:
  • X (array-like or DataFrame) – Input data to be encoded.

  • batch_size (int, optional, default=64) – Batch size for encoding.

Returns:

Encoded representations of the input data.

Return type:

torch.Tensor

Raises:

ValueError – If the model or data module is not fitted.

evaluate(X, y_true, embeddings=None, metrics=None)

Evaluate the model on the given data using specified metrics.

Parameters:
  • X (array-like or pd.DataFrame of shape (n_samples, n_features)) – The input samples to predict.

  • y_true (array-like of shape (n_samples,)) – The true class labels against which to evaluate the predictions.

  • embneddings (array-like or list of shape(n_samples, dimension)) – List or array with embeddings for unstructured data inputs

  • metrics (dict) – A dictionary where keys are metric names and values are tuples containing the metric function and a boolean indicating whether the metric requires probability scores (True) or class labels (False).

Returns:

scores – A dictionary with metric names as keys and their corresponding scores as values.

Return type:

dict

Notes

This method uses either the predict or predict_proba method depending on the metric requirements.

fit(X, y, val_size=0.2, X_val=None, y_val=None, embeddings=None, embeddings_val=None, max_epochs=100, random_state=101, batch_size=128, shuffle=True, patience=15, monitor='val_loss', mode='min', lr=None, lr_patience=None, lr_factor=None, weight_decay=None, checkpoint_path='model_checkpoints', train_metrics=None, val_metrics=None, dataloader_kwargs={}, rebuild=True, **trainer_kwargs)

Trains the classification model using the provided training data. Optionally, a separate validation set can be used.

Parameters:
  • X (DataFrame or array-like, shape (n_samples, n_features)) – The training input samples.

  • y (array-like, shape (n_samples,) or (n_samples, n_targets)) – The target values (real numbers).

  • val_size (float) – The proportion of the dataset to include in the validation split if X_val is None. Ignored if X_val is provided.

  • X_val (DataFrame or array-like, shape (n_samples, n_features), optional) – The validation input samples. If provided, X and y are not split and this data is used for validation.

  • y_val (array-like, shape (n_samples,) or (n_samples, n_targets), optional) – The validation target values. Required if X_val is provided.

  • max_epochs (int) – Maximum number of epochs for training.

  • random_state (int) – Controls the shuffling applied to the data before applying the split.

  • batch_size (int) – Number of samples per gradient update.

  • shuffle (bool) – Whether to shuffle the training data before each epoch.

  • patience (int) – Number of epochs with no improvement on the validation loss to wait before early stopping.

  • monitor (str) – The metric to monitor for early stopping.

  • mode (str) – Whether the monitored metric should be minimized (min) or maximized (max).

  • lr (float | None) – Learning rate for the optimizer.

  • lr_patience (int | None) – Number of epochs with no improvement on the validation loss to wait before reducing the learning rate.

  • factor (float, default=0.1) – Factor by which the learning rate will be reduced.

  • weight_decay (float | None) – Weight decay (L2 penalty) coefficient.

  • checkpoint_path (str, default="model_checkpoints") – Path where the checkpoints are being saved.

  • train_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during training.

  • val_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during validation.

  • dataloader_kwargs (dict, default={}) – The kwargs for the pytorch dataloader class.

  • rebuild (bool, default=True) – Whether to rebuild the model when it already was built.

  • **trainer_kwargs (Additional keyword arguments for PyTorch Lightning's Trainer class.)

Returns:

self – The fitted classifier.

Return type:

object

get_number_of_params(requires_grad=True)

Calculate the number of parameters in the model.

Parameters:

requires_grad (bool, optional) – If True, only count the parameters that require gradients (trainable parameters). If False, count all parameters. Default is True.

Returns:

The total number of parameters in the model.

Return type:

int

Raises:

ValueError – If the model has not been built prior to calling this method.

get_params(deep=True)

Get parameters for this estimator.

classmethod load(path)

Load and return a fitted model from path.

Parameters:

path (str) – Path to a file previously written by save().

Returns:

A fully reconstructed, ready-to-predict estimator of the same type that was saved.

Return type:

estimator

optimize_hparams(X, y, X_val=None, y_val=None, embeddings=None, embeddings_val=None, time=100, max_epochs=200, prune_by_epoch=True, prune_epoch=5, fixed_params={'cat_encoding': 'int', 'head_layer_size_length': 0, 'head_skip_layer': False, 'head_skip_layers': False, 'pooling_method': 'avg', 'use_cls': False}, custom_search_space=None, **optimize_kwargs)

Optimizes hyperparameters using Bayesian optimization with optional pruning.

Parameters:
  • X (array-like) – Training data.

  • y (array-like) – Training labels.

  • X_val (array-like, optional) – Validation data and labels.

  • y_val (array-like, optional) – Validation data and labels.

  • time (int) – The number of optimization trials to run.

  • max_epochs (int) – Maximum number of epochs for training.

  • prune_by_epoch (bool) – Whether to prune based on a specific epoch (True) or the best validation loss (False).

  • prune_epoch (int) – The specific epoch to prune by when prune_by_epoch is True.

  • **optimize_kwargs (dict) – Additional keyword arguments passed to the fit method.

Returns:

best_hparams – Best hyperparameters found during optimization.

Return type:

list

predict(X, embeddings=None, device=None)

Predicts target labels for the given input samples.

Parameters:

X (DataFrame or array-like, shape (n_samples, n_features)) – The input samples for which to predict target values.

Returns:

predictions – The predicted class labels.

Return type:

ndarray, shape (n_samples,)

predict_proba(X, embeddings=None, device=None)

Predicts class probabilities for the given input samples.

Parameters:

X (DataFrame or array-like, shape (n_samples, n_features)) – The input samples for which to predict class probabilities.

Returns:

probabilities – The predicted class probabilities.

Return type:

ndarray, shape (n_samples, n_classes)

pretrain(pretrain_epochs=15, k_neighbors=10, temperature=0.1, save_path='pretrained_embeddings.pth', lr=0.001, use_positive=True, use_negative=False, pool_sequence=True)

Pretrains the embedding layer of the model using a contrastive learning approach.

This method performs pretraining by optimizing the embeddings with respect to neighborhood structure in the feature space. The embeddings are saved after training.

Parameters:
  • pretrain_epochs (int, default=15) – Number of epochs to run pretraining.

  • k_neighbors (int, default=10) – Number of neighbors used in the contrastive loss computation.

  • temperature (float, default=0.1) – Temperature parameter for contrastive loss scaling.

  • save_path (str, default="pretrained_embeddings.pth") – Path to save the pretrained embeddings.

  • lr (float, default=1e-3) – Learning rate for the pretraining optimizer.

  • use_positive (bool, default=True) – Whether to include positive pairs in contrastive learning.

  • use_negative (bool, default=False) – Whether to include negative pairs in contrastive learning.

  • pool_sequence (bool, default=True) – Whether to apply sequence pooling before computing contrastive loss.

Raises:
  • ValueError – If the model has not been built before calling this method.

  • ValueError – If the model does not contain an embedding layer.

Notes

  • This function requires that self.build_model() has been called beforehand.

  • The pretraining method uses self.task_model.estimator.embedding_layer.

  • The method invokes super()._pretrain() with regression mode enabled.

save(path)

Save the fitted model to path.

The bundle written by this method can be restored with load(). It contains all state required for inference: the config, the fitted preprocessor, feature metadata, and the neural-network weights.

Parameters:

path (str) – Destination file path (e.g. "model.pt").

Raises:

ValueError – If the model has not been fitted yet.

Return type:

None

score(X, y, embeddings=None, metric=(sklearn.metrics.log_loss, True))

Calculate the score of the model using the specified metric.

Parameters:
  • X (array-like or pd.DataFrame of shape (n_samples, n_features)) – The input samples to predict.

  • y (array-like of shape (n_samples,)) – The true class labels against which to evaluate the predictions.

  • metric (tuple, default=(log_loss, True)) – A tuple containing the metric function and a boolean indicating whether the metric requires probability scores (True) or class labels (False).

Returns:

score – The score calculated using the specified metric.

Return type:

float

set_params(**parameters)

Set the parameters of this estimator.

class deeptab.models.experimental.ModernNCARegressor(*args: Any, **kwargs: Any)[source]

Multi-Layer Perceptron regressor. This class extends the SklearnBaseRegressor class and uses the ModernNCA model with the default ModernNCA configuration.

Notes

The parameters for this class include the attributes from the config dataclass as well as preprocessing arguments handled by the base class.

Default configuration for the ModernNCA model.

feature_preprocessingdict, optional

Dictionary mapping feature names to specific preprocessing methods. Overrides global defaults.

n_binsint, default=64

Number of bins used for binning-based preprocessing (e.g., for discretizers or PLE).

numerical_preprocessingstr, default=”ple”

Preprocessing method for numerical features (e.g., “standardization”, “minmax”, “ple”, “rbf”, etc.).

categorical_preprocessingstr, default=”int”

Preprocessing method for categorical features (e.g., “int”, “ordinal”, “onehot”).

use_decision_tree_binsbool, default=False

Whether to use decision tree binning for numerical discretization.

binning_strategystr, default=”uniform”

Strategy for bin placement when not using tree-based methods. Options: “uniform”, “quantile”.

taskstr, default=”regression”

Problem type used to guide preprocessing (e.g., “regression” or “classification”).

cat_cutofffloat or int, default=0.03

Threshold to determine whether integer-valued features are treated as categorical.

treat_all_integers_as_numericalbool, default=False

If True, treat all integer-typed columns as numerical regardless of cardinality.

degreeint, default=3

Degree of polynomial or spline basis functions where applicable.

scaling_strategystr, default=”minmax”

Strategy for feature scaling (e.g., “standardization”, “minmax”, etc.).

n_knotsint, default=64

Number of knots used in spline-based feature expansions.

use_decision_tree_knotsbool, default=True

Whether to use decision tree-based knot placement for spline transformations.

knots_strategystr, default=”uniform”

Strategy for placing knots for splines (“uniform” or “quantile”).

spline_implementationstr, default=”sklearn”

Which spline backend implementation to use (e.g., “sklearn”, “custom”).

min_unique_valsint, default=5

Minimum number of unique values required for a feature to be treated as numerical.

Examples

>>> from deeptab.models.experimental import ModernNCARegressor
>>> model = ModernNCARegressor(d_model=64, n_layers=8)
>>> model.fit(X_train, y_train)
>>> preds = model.predict(X_test)
>>> model.evaluate(X_test, y_test)
build_model(X, y, val_size=0.2, X_val=None, y_val=None, embeddings=None, embeddings_val=None, random_state=101, batch_size=128, shuffle=True, lr=None, lr_patience=None, lr_factor=None, weight_decay=None, train_metrics=None, val_metrics=None, dataloader_kwargs={})

Builds the model using the provided training data.

Parameters:
  • X (DataFrame or array-like, shape (n_samples, n_features)) – The training input samples.

  • y (array-like, shape (n_samples,) or (n_samples, n_targets)) – The target values (real numbers).

  • val_size (float) – The proportion of the dataset to include in the validation split if X_val is None. Ignored if X_val is provided.

  • X_val (DataFrame or array-like, shape (n_samples, n_features), optional) – The validation input samples. If provided, X and y are not split and this data is used for validation.

  • y_val (array-like, shape (n_samples,) or (n_samples, n_targets), optional) – The validation target values. Required if X_val is provided.

  • random_state (int) – Controls the shuffling applied to the data before applying the split.

  • batch_size (int) – Number of samples per gradient update.

  • shuffle (bool) – Whether to shuffle the training data before each epoch.

  • lr (float | None) – Learning rate for the optimizer.

  • lr_patience (int | None) – Number of epochs with no improvement on the validation loss to wait before reducing the learning rate.

  • factor (float, default=0.1) – Factor by which the learning rate will be reduced.

  • weight_decay (float | None) – Weight decay (L2 penalty) coefficient.

  • train_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during training.

  • val_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during validation.

  • dataloader_kwargs (dict, default={}) – The kwargs for the pytorch dataloader class.

Returns:

self – The built regressor.

Return type:

object

encode(X, embeddings=None, batch_size=64)

Encodes input data using the trained model’s embedding layer.

Parameters:
  • X (array-like or DataFrame) – Input data to be encoded.

  • batch_size (int, optional, default=64) – Batch size for encoding.

Returns:

Encoded representations of the input data.

Return type:

torch.Tensor

Raises:

ValueError – If the model or data module is not fitted.

evaluate(X, y_true, embeddings=None, metrics=None)

Evaluate the model on the given data using specified metrics.

Parameters:
  • X (array-like or pd.DataFrame of shape (n_samples, n_features)) – The input samples to predict.

  • y_true (array-like of shape (n_samples,) or (n_samples, n_outputs)) – The true target values against which to evaluate the predictions.

  • metrics (dict) – A dictionary where keys are metric names and values are the metric functions.

Notes

This method uses the predict method to generate predictions and computes each metric.

Returns:

scores – A dictionary with metric names as keys and their corresponding scores as values.

Return type:

dict

fit(X, y, val_size=0.2, X_val=None, y_val=None, embeddings=None, embeddings_val=None, max_epochs=100, random_state=101, batch_size=128, shuffle=True, patience=15, monitor='val_loss', mode='min', lr=None, lr_patience=None, lr_factor=None, weight_decay=None, checkpoint_path='model_checkpoints', dataloader_kwargs={}, train_metrics=None, val_metrics=None, rebuild=True, **trainer_kwargs)

Trains the regression model using the provided training data. Optionally, a separate validation set can be used.

Parameters:
  • X (DataFrame or array-like, shape (n_samples, n_features)) – The training input samples.

  • y (array-like, shape (n_samples,) or (n_samples, n_targets)) – The target values (real numbers).

  • val_size (float) – The proportion of the dataset to include in the validation split if X_val is None. Ignored if X_val is provided.

  • X_val (DataFrame or array-like, shape (n_samples, n_features), optional) – The validation input samples. If provided, X and y are not split and this data is used for validation.

  • y_val (array-like, shape (n_samples,) or (n_samples, n_targets), optional) – The validation target values. Required if X_val is provided.

  • max_epochs (int) – Maximum number of epochs for training.

  • random_state (int) – Controls the shuffling applied to the data before applying the split.

  • batch_size (int) – Number of samples per gradient update.

  • shuffle (bool) – Whether to shuffle the training data before each epoch.

  • patience (int) – Number of epochs with no improvement on the validation loss to wait before early stopping.

  • monitor (str) – The metric to monitor for early stopping.

  • mode (str) – Whether the monitored metric should be minimized (min) or maximized (max).

  • lr (float | None) – Learning rate for the optimizer.

  • lr_patience (int | None) – Number of epochs with no improvement on the validation loss to wait before reducing the learning rate.

  • factor (float, default=0.1) – Factor by which the learning rate will be reduced.

  • weight_decay (float | None) – Weight decay (L2 penalty) coefficient.

  • checkpoint_path (str, default="model_checkpoints") – Path where the checkpoints are being saved.

  • dataloader_kwargs (dict, default={}) – The kwargs for the pytorch dataloader class.

  • train_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during training.

  • val_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during validation.

  • rebuild (bool, default=True) – Whether to rebuild the model when it already was built.

  • **trainer_kwargs (Additional keyword arguments for PyTorch Lightning's Trainer class.)

Returns:

self – The fitted regressor.

Return type:

object

get_number_of_params(requires_grad=True)

Calculate the number of parameters in the model.

Parameters:

requires_grad (bool, optional) – If True, only count the parameters that require gradients (trainable parameters). If False, count all parameters. Default is True.

Returns:

The total number of parameters in the model.

Return type:

int

Raises:

ValueError – If the model has not been built prior to calling this method.

get_params(deep=True)

Get parameters for this estimator.

classmethod load(path)

Load and return a fitted model from path.

Parameters:

path (str) – Path to a file previously written by save().

Returns:

A fully reconstructed, ready-to-predict estimator of the same type that was saved.

Return type:

estimator

optimize_hparams(X, y, X_val=None, y_val=None, embeddings=None, embeddings_val=None, time=100, max_epochs=200, prune_by_epoch=True, prune_epoch=5, fixed_params={'cat_encoding': 'int', 'head_layer_size_length': 0, 'head_skip_layer': False, 'head_skip_layers': False, 'pooling_method': 'avg', 'use_cls': False}, custom_search_space=None, **optimize_kwargs)

Optimizes hyperparameters using Bayesian optimization with optional pruning.

Parameters:
  • X (array-like) – Training data.

  • y (array-like) – Training labels.

  • X_val (array-like, optional) – Validation data and labels.

  • y_val (array-like, optional) – Validation data and labels.

  • time (int) – The number of optimization trials to run.

  • max_epochs (int) – Maximum number of epochs for training.

  • prune_by_epoch (bool) – Whether to prune based on a specific epoch (True) or the best validation loss (False).

  • prune_epoch (int) – The specific epoch to prune by when prune_by_epoch is True.

  • **optimize_kwargs (dict) – Additional keyword arguments passed to the fit method.

Returns:

best_hparams – Best hyperparameters found during optimization.

Return type:

list

predict(X, embeddings=None, device=None)

Predicts target values for the given input samples.

Parameters:

X (DataFrame or array-like, shape (n_samples, n_features)) – The input samples for which to predict target values.

Returns:

predictions – The predicted target values.

Return type:

ndarray, shape (n_samples,) or (n_samples, n_outputs)

pretrain(pretrain_epochs=15, k_neighbors=10, temperature=0.1, save_path='pretrained_embeddings.pth', lr=0.001, use_positive=True, use_negative=False, pool_sequence=True)

Pretrains the embedding layer of the model using a contrastive learning approach.

This method performs pretraining by optimizing the embeddings with respect to neighborhood structure in the feature space. The embeddings are saved after training.

Parameters:
  • pretrain_epochs (int, default=15) – Number of epochs to run pretraining.

  • k_neighbors (int, default=10) – Number of neighbors used in the contrastive loss computation.

  • temperature (float, default=0.1) – Temperature parameter for contrastive loss scaling.

  • save_path (str, default="pretrained_embeddings.pth") – Path to save the pretrained embeddings.

  • lr (float, default=1e-3) – Learning rate for the pretraining optimizer.

  • use_positive (bool, default=True) – Whether to include positive pairs in contrastive learning.

  • use_negative (bool, default=False) – Whether to include negative pairs in contrastive learning.

  • pool_sequence (bool, default=True) – Whether to apply sequence pooling before computing contrastive loss.

Raises:
  • ValueError – If the model has not been built before calling this method.

  • ValueError – If the model does not contain an embedding layer.

Notes

  • This function requires that self.build_model() has been called beforehand.

  • The pretraining method uses self.task_model.estimator.embedding_layer.

  • The method invokes super()._pretrain() with regression mode enabled.

save(path)

Save the fitted model to path.

The bundle written by this method can be restored with load(). It contains all state required for inference: the config, the fitted preprocessor, feature metadata, and the neural-network weights.

Parameters:

path (str) – Destination file path (e.g. "model.pt").

Raises:

ValueError – If the model has not been fitted yet.

Return type:

None

score(X, y, embeddings=None, metric=sklearn.metrics.mean_squared_error)

Calculate the score of the model using the specified metric.

Parameters:
  • X (array-like or pd.DataFrame of shape (n_samples, n_features)) – The input samples to predict.

  • y (array-like of shape (n_samples,) or (n_samples, n_outputs)) – The true target values against which to evaluate the predictions.

  • metric (callable, default=mean_squared_error) – The metric function to use for evaluation. Must be a callable with the signature metric(y_true, y_pred).

Returns:

score – The score calculated using the specified metric.

Return type:

float

set_params(**parameters)

Set the parameters of this estimator.

class deeptab.models.experimental.ModernNCALSS(*args: Any, **kwargs: Any)[source]

Multi-Layer Perceptron for distributional regression. This class extends the SklearnBaseLSS class and uses the ModernNCA model with the default ModernNCA configuration.

Notes

The parameters for this class include the attributes from the config dataclass as well as preprocessing arguments handled by the base class.

Default configuration for the ModernNCA model.

feature_preprocessingdict, optional

Dictionary mapping feature names to specific preprocessing methods. Overrides global defaults.

n_binsint, default=64

Number of bins used for binning-based preprocessing (e.g., for discretizers or PLE).

numerical_preprocessingstr, default=”ple”

Preprocessing method for numerical features (e.g., “standardization”, “minmax”, “ple”, “rbf”, etc.).

categorical_preprocessingstr, default=”int”

Preprocessing method for categorical features (e.g., “int”, “ordinal”, “onehot”).

use_decision_tree_binsbool, default=False

Whether to use decision tree binning for numerical discretization.

binning_strategystr, default=”uniform”

Strategy for bin placement when not using tree-based methods. Options: “uniform”, “quantile”.

taskstr, default=”regression”

Problem type used to guide preprocessing (e.g., “regression” or “classification”).

cat_cutofffloat or int, default=0.03

Threshold to determine whether integer-valued features are treated as categorical.

treat_all_integers_as_numericalbool, default=False

If True, treat all integer-typed columns as numerical regardless of cardinality.

degreeint, default=3

Degree of polynomial or spline basis functions where applicable.

scaling_strategystr, default=”minmax”

Strategy for feature scaling (e.g., “standardization”, “minmax”, etc.).

n_knotsint, default=64

Number of knots used in spline-based feature expansions.

use_decision_tree_knotsbool, default=True

Whether to use decision tree-based knot placement for spline transformations.

knots_strategystr, default=”uniform”

Strategy for placing knots for splines (“uniform” or “quantile”).

spline_implementationstr, default=”sklearn”

Which spline backend implementation to use (e.g., “sklearn”, “custom”).

min_unique_valsint, default=5

Minimum number of unique values required for a feature to be treated as numerical.

Examples

>>> from deeptab.models.experimental import ModernNCALSS
>>> model = ModernNCALSS(d_model=64, n_layers=8)
>>> model.fit(X_train, y_train, family='normal')
>>> preds = model.predict(X_test)
>>> model.evaluate(X_test, y_test)
build_model(X, y, val_size=0.2, X_val=None, y_val=None, random_state=101, batch_size=128, shuffle=True, lr=None, lr_patience=None, lr_factor=None, weight_decay=None, train_metrics=None, val_metrics=None, dataloader_kwargs={})

Builds the model using the provided training data.

Parameters:
  • X (DataFrame or array-like, shape (n_samples, n_features)) – The training input samples.

  • y (array-like, shape (n_samples,) or (n_samples, n_targets)) – The target values (real numbers).

  • val_size (float) – The proportion of the dataset to include in the validation split if X_val is None. Ignored if X_val is provided.

  • X_val (DataFrame or array-like, shape (n_samples, n_features), optional) – The validation input samples. If provided, X and y are not split and this data is used for validation.

  • y_val (array-like, shape (n_samples,) or (n_samples, n_targets), optional) – The validation target values. Required if X_val is provided.

  • random_state (int) – Controls the shuffling applied to the data before applying the split.

  • batch_size (int) – Number of samples per gradient update.

  • shuffle (bool) – Whether to shuffle the training data before each epoch.

  • lr (float | None) – Learning rate for the optimizer.

  • lr_patience (int | None) – Number of epochs with no improvement on the validation loss to wait before reducing the learning rate.

  • lr_factor (float | None) – Factor by which the learning rate will be reduced.

  • train_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during training.

  • val_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during validation.

  • weight_decay (float | None) – Weight decay (L2 penalty) coefficient.

  • dataloader_kwargs (dict, default={}) – The kwargs for the pytorch dataloader class.

Returns:

self – The built distributional regressor.

Return type:

object

encode(X, batch_size=64)

Encodes input data using the trained model’s embedding layer.

Parameters:
  • X (array-like or DataFrame) – Input data to be encoded.

  • batch_size (int, optional, default=64) – Batch size for encoding.

Returns:

Encoded representations of the input data.

Return type:

torch.Tensor

Raises:

ValueError – If the model or data module is not fitted.

evaluate(X, y_true, metrics=None, distribution_family=None)

Evaluate the model on the given data using specified metrics.

Parameters:
  • X (array-like or pd.DataFrame of shape (n_samples, n_features)) – The input samples to predict.

  • y_true (array-like of shape (n_samples,)) – The true class labels against which to evaluate the predictions.

  • metrics (dict) – A dictionary where keys are metric names and values are tuples containing the metric function and a boolean indicating whether the metric requires probability scores (True) or class labels (False).

  • distribution_family (str, optional) – Specifies the distribution family the model is predicting for. If None, it will attempt to infer based on the model’s settings.

Returns:

scores – A dictionary with metric names as keys and their corresponding scores as values.

Return type:

dict

Notes

This method uses either the predict or predict_proba method depending on the metric requirements.

fit(X, y, family, val_size=0.2, X_val=None, y_val=None, max_epochs=100, random_state=101, batch_size=128, shuffle=True, patience=15, monitor='val_loss', mode='min', lr=None, lr_patience=None, lr_factor=None, weight_decay=None, checkpoint_path='model_checkpoints', distributional_kwargs=None, train_metrics=None, val_metrics=None, dataloader_kwargs={}, rebuild=True, **trainer_kwargs)

Trains the regression model using the provided training data. Optionally, a separate validation set can be used.

Parameters:
  • X (DataFrame or array-like, shape (n_samples, n_features)) – The training input samples.

  • y (array-like, shape (n_samples,) or (n_samples, n_targets)) – The target values (real numbers).

  • family (str) – The name of the distribution family to use for the loss function. Examples include ‘normal’ for regression tasks.

  • val_size (float) – The proportion of the dataset to include in the validation split if X_val is None. Ignored if X_val is provided.

  • X_val (DataFrame or array-like, shape (n_samples, n_features), optional) – The validation input samples. If provided, X and y are not split and this data is used for validation.

  • y_val (array-like, shape (n_samples,) or (n_samples, n_targets), optional) – The validation target values. Required if X_val is provided.

  • max_epochs (int) – Maximum number of epochs for training.

  • random_state (int) – Controls the shuffling applied to the data before applying the split.

  • batch_size (int) – Number of samples per gradient update.

  • shuffle (bool) – Whether to shuffle the training data before each epoch.

  • patience (int) – Number of epochs with no improvement on the validation loss to wait before early stopping.

  • monitor (str) – The metric to monitor for early stopping.

  • mode (str) – Whether the monitored metric should be minimized (min) or maximized (max).

  • lr (float | None) – Learning rate for the optimizer.

  • lr_patience (int | None) – Number of epochs with no improvement on the validation loss to wait before reducing the learning rate.

  • factor (float, default=0.1) – Factor by which the learning rate will be reduced.

  • weight_decay (float | None) – Weight decay (L2 penalty) coefficient.

  • distributional_kwargs (dict, default=None) – any arguments taht are specific for a certain distribution.

  • train_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during training.

  • val_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during validation.

  • checkpoint_path (str, default="model_checkpoints") – Path where the checkpoints are being saved.

  • dataloader_kwargs (dict, default={}) – The kwargs for the pytorch dataloader class.

  • **trainer_kwargs (Additional keyword arguments for PyTorch Lightning's Trainer class.)

Returns:

self – The fitted regressor.

Return type:

object

get_default_metrics(distribution_family)

Provides default metrics based on the distribution family.

Parameters:

distribution_family (str) – The distribution family for which to provide default metrics.

Returns:

metrics – A dictionary of default metric functions.

Return type:

dict

get_number_of_params(requires_grad=True)

Calculate the number of parameters in the model.

Parameters:

requires_grad (bool, optional) – If True, only count the parameters that require gradients (trainable parameters). If False, count all parameters. Default is True.

Returns:

The total number of parameters in the model.

Return type:

int

Raises:

ValueError – If the model has not been built prior to calling this method.

get_params(deep=True)

Get parameters for this estimator.

Parameters:

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

params – Parameter names mapped to their values.

Return type:

dict

classmethod load(path)

Load and return a fitted model from path.

Parameters:

path (str) – Path to a file previously written by save().

Returns:

A fully reconstructed, ready-to-predict estimator.

Return type:

estimator

optimize_hparams(X, y, X_val=None, y_val=None, time=100, max_epochs=200, prune_by_epoch=True, prune_epoch=5, fixed_params={'cat_encoding': 'int', 'head_layer_size_length': 0, 'head_skip_layer': False, 'head_skip_layers': False, 'pooling_method': 'avg', 'use_cls': False}, custom_search_space=None, **optimize_kwargs)

Optimizes hyperparameters using Bayesian optimization with optional pruning.

Parameters:
  • X (array-like) – Training data.

  • y (array-like) – Training labels.

  • X_val (array-like, optional) – Validation data and labels.

  • y_val (array-like, optional) – Validation data and labels.

  • time (int) – The number of optimization trials to run.

  • max_epochs (int) – Maximum number of epochs for training.

  • prune_by_epoch (bool) – Whether to prune based on a specific epoch (True) or the best validation loss (False).

  • prune_epoch (int) – The specific epoch to prune by when prune_by_epoch is True.

  • **optimize_kwargs (dict) – Additional keyword arguments passed to the fit method.

Returns:

best_hparams – Best hyperparameters found during optimization.

Return type:

list

predict(X, raw=False, device=None)

Predicts target values for the given input samples.

Parameters:

X (DataFrame or array-like, shape (n_samples, n_features)) – The input samples for which to predict target values.

Returns:

predictions – The predicted target values.

Return type:

ndarray, shape (n_samples,) or (n_samples, n_outputs)

save(path)

Save the fitted model to path.

Parameters:

path (str) – Destination file path (e.g. "model.pt").

Raises:

ValueError – If the model has not been fitted yet.

Return type:

None

score(X, y, metric='NLL')

Calculate the score of the model using the specified metric.

Parameters:
  • X (array-like or pd.DataFrame of shape (n_samples, n_features)) – The input samples to predict.

  • y (array-like of shape (n_samples,) or (n_samples, n_outputs)) – The true target values against which to evaluate the predictions.

  • metric (str, default="NLL") – So far, only negative log-likelihood is supported

Returns:

score – The score calculated using the specified metric.

Return type:

float

set_params(**parameters)

Set the parameters of this estimator.

Parameters:

**parameters (dict) – Estimator parameters.

Returns:

self – Estimator instance.

Return type:

object

class deeptab.models.experimental.TangosClassifier(*args: Any, **kwargs: Any)[source]

Tangos classifier This class extends the SklearnBaseClassifier class and uses the Tangos model with the default Tangos configuration.

Notes

The parameters for this class include the attributes from the config dataclass as well as preprocessing arguments handled by the base class.

Configuration class for the default Multi-Layer Perceptron (TANGOS) model with predefined hyperparameters.

Parameters:
  • layer_sizes (list, default=(256, 128, 32)) – Sizes of the layers in the TANGOS.

  • activation (callable, default=nn.ReLU()) – Activation function for the TANGOS layers.

  • skip_layers (bool, default=False) – Whether to skip layers in the TANGOS.

  • dropout (float, default=0.2) – Dropout rate for regularization.

  • use_glu (bool, default=False) – Whether to use Gated Linear Units (GLU) in the TANGOS.

  • skip_connections (bool, default=False) – Whether to use skip connections in the TANGOS.

  • feature_preprocessing (dict, optional) – Dictionary mapping feature names to specific preprocessing methods. Overrides global defaults.

  • n_bins (int, default=64) – Number of bins used for binning-based preprocessing (e.g., for discretizers or PLE).

  • numerical_preprocessing (str, default="ple") – Preprocessing method for numerical features (e.g., “standardization”, “minmax”, “ple”, “rbf”, etc.).

  • categorical_preprocessing (str, default="int") – Preprocessing method for categorical features (e.g., “int”, “ordinal”, “onehot”).

  • use_decision_tree_bins (bool, default=False) – Whether to use decision tree binning for numerical discretization.

  • binning_strategy (str, default="uniform") – Strategy for bin placement when not using tree-based methods. Options: “uniform”, “quantile”.

  • task (str, default="regression") – Problem type used to guide preprocessing (e.g., “regression” or “classification”).

  • cat_cutoff (float or int, default=0.03) – Threshold to determine whether integer-valued features are treated as categorical.

  • treat_all_integers_as_numerical (bool, default=False) – If True, treat all integer-typed columns as numerical regardless of cardinality.

  • degree (int, default=3) – Degree of polynomial or spline basis functions where applicable.

  • scaling_strategy (str, default="minmax") – Strategy for feature scaling (e.g., “standardization”, “minmax”, etc.).

  • n_knots (int, default=64) – Number of knots used in spline-based feature expansions.

  • use_decision_tree_knots (bool, default=True) – Whether to use decision tree-based knot placement for spline transformations.

  • knots_strategy (str, default="uniform") – Strategy for placing knots for splines (“uniform” or “quantile”).

  • spline_implementation (str, default="sklearn") – Which spline backend implementation to use (e.g., “sklearn”, “custom”).

  • min_unique_vals (int, default=5) – Minimum number of unique values required for a feature to be treated as numerical.

Examples

>>> from deeptab.models.experimental import TangosClassifier
>>> model = TangosClassifier(d_model=64, n_layers=8)
>>> model.fit(X_train, y_train)
>>> preds = model.predict(X_test)
>>> model.evaluate(X_test, y_test)
build_model(X, y, val_size=0.2, X_val=None, y_val=None, embeddings=None, embeddings_val=None, random_state=101, batch_size=128, shuffle=True, lr=None, lr_patience=None, lr_factor=None, weight_decay=None, train_metrics=None, val_metrics=None, dataloader_kwargs={})

Builds the model using the provided training data.

Parameters:
  • X (DataFrame or array-like, shape (n_samples, n_features)) – The training input samples.

  • y (array-like, shape (n_samples,) or (n_samples, n_targets)) – The target values (real numbers).

  • val_size (float) – The proportion of the dataset to include in the validation split if X_val is None. Ignored if X_val is provided.

  • X_val (DataFrame or array-like, shape (n_samples, n_features), optional) – The validation input samples. If provided, X and y are not split and this data is used for validation.

  • y_val (array-like, shape (n_samples,) or (n_samples, n_targets), optional) – The validation target values. Required if X_val is provided.

  • random_state (int) – Controls the shuffling applied to the data before applying the split.

  • batch_size (int) – Number of samples per gradient update.

  • shuffle (bool) – Whether to shuffle the training data before each epoch.

  • lr (float | None) – Learning rate for the optimizer.

  • lr_patience (int | None) – Number of epochs with no improvement on the validation loss to wait before reducing the learning rate.

  • lr_factor (float | None) – Factor by which the learning rate will be reduced.

  • train_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during training.

  • val_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during validation.

  • weight_decay (float | None) – Weight decay (L2 penalty) coefficient.

  • dataloader_kwargs (dict, default={}) – The kwargs for the pytorch dataloader class.

Returns:

self – The built classifier.

Return type:

object

encode(X, embeddings=None, batch_size=64)

Encodes input data using the trained model’s embedding layer.

Parameters:
  • X (array-like or DataFrame) – Input data to be encoded.

  • batch_size (int, optional, default=64) – Batch size for encoding.

Returns:

Encoded representations of the input data.

Return type:

torch.Tensor

Raises:

ValueError – If the model or data module is not fitted.

evaluate(X, y_true, embeddings=None, metrics=None)

Evaluate the model on the given data using specified metrics.

Parameters:
  • X (array-like or pd.DataFrame of shape (n_samples, n_features)) – The input samples to predict.

  • y_true (array-like of shape (n_samples,)) – The true class labels against which to evaluate the predictions.

  • embneddings (array-like or list of shape(n_samples, dimension)) – List or array with embeddings for unstructured data inputs

  • metrics (dict) – A dictionary where keys are metric names and values are tuples containing the metric function and a boolean indicating whether the metric requires probability scores (True) or class labels (False).

Returns:

scores – A dictionary with metric names as keys and their corresponding scores as values.

Return type:

dict

Notes

This method uses either the predict or predict_proba method depending on the metric requirements.

fit(X, y, val_size=0.2, X_val=None, y_val=None, embeddings=None, embeddings_val=None, max_epochs=100, random_state=101, batch_size=128, shuffle=True, patience=15, monitor='val_loss', mode='min', lr=None, lr_patience=None, lr_factor=None, weight_decay=None, checkpoint_path='model_checkpoints', train_metrics=None, val_metrics=None, dataloader_kwargs={}, rebuild=True, **trainer_kwargs)

Trains the classification model using the provided training data. Optionally, a separate validation set can be used.

Parameters:
  • X (DataFrame or array-like, shape (n_samples, n_features)) – The training input samples.

  • y (array-like, shape (n_samples,) or (n_samples, n_targets)) – The target values (real numbers).

  • val_size (float) – The proportion of the dataset to include in the validation split if X_val is None. Ignored if X_val is provided.

  • X_val (DataFrame or array-like, shape (n_samples, n_features), optional) – The validation input samples. If provided, X and y are not split and this data is used for validation.

  • y_val (array-like, shape (n_samples,) or (n_samples, n_targets), optional) – The validation target values. Required if X_val is provided.

  • max_epochs (int) – Maximum number of epochs for training.

  • random_state (int) – Controls the shuffling applied to the data before applying the split.

  • batch_size (int) – Number of samples per gradient update.

  • shuffle (bool) – Whether to shuffle the training data before each epoch.

  • patience (int) – Number of epochs with no improvement on the validation loss to wait before early stopping.

  • monitor (str) – The metric to monitor for early stopping.

  • mode (str) – Whether the monitored metric should be minimized (min) or maximized (max).

  • lr (float | None) – Learning rate for the optimizer.

  • lr_patience (int | None) – Number of epochs with no improvement on the validation loss to wait before reducing the learning rate.

  • factor (float, default=0.1) – Factor by which the learning rate will be reduced.

  • weight_decay (float | None) – Weight decay (L2 penalty) coefficient.

  • checkpoint_path (str, default="model_checkpoints") – Path where the checkpoints are being saved.

  • train_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during training.

  • val_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during validation.

  • dataloader_kwargs (dict, default={}) – The kwargs for the pytorch dataloader class.

  • rebuild (bool, default=True) – Whether to rebuild the model when it already was built.

  • **trainer_kwargs (Additional keyword arguments for PyTorch Lightning's Trainer class.)

Returns:

self – The fitted classifier.

Return type:

object

get_number_of_params(requires_grad=True)

Calculate the number of parameters in the model.

Parameters:

requires_grad (bool, optional) – If True, only count the parameters that require gradients (trainable parameters). If False, count all parameters. Default is True.

Returns:

The total number of parameters in the model.

Return type:

int

Raises:

ValueError – If the model has not been built prior to calling this method.

get_params(deep=True)

Get parameters for this estimator.

classmethod load(path)

Load and return a fitted model from path.

Parameters:

path (str) – Path to a file previously written by save().

Returns:

A fully reconstructed, ready-to-predict estimator of the same type that was saved.

Return type:

estimator

optimize_hparams(X, y, X_val=None, y_val=None, embeddings=None, embeddings_val=None, time=100, max_epochs=200, prune_by_epoch=True, prune_epoch=5, fixed_params={'cat_encoding': 'int', 'head_layer_size_length': 0, 'head_skip_layer': False, 'head_skip_layers': False, 'pooling_method': 'avg', 'use_cls': False}, custom_search_space=None, **optimize_kwargs)

Optimizes hyperparameters using Bayesian optimization with optional pruning.

Parameters:
  • X (array-like) – Training data.

  • y (array-like) – Training labels.

  • X_val (array-like, optional) – Validation data and labels.

  • y_val (array-like, optional) – Validation data and labels.

  • time (int) – The number of optimization trials to run.

  • max_epochs (int) – Maximum number of epochs for training.

  • prune_by_epoch (bool) – Whether to prune based on a specific epoch (True) or the best validation loss (False).

  • prune_epoch (int) – The specific epoch to prune by when prune_by_epoch is True.

  • **optimize_kwargs (dict) – Additional keyword arguments passed to the fit method.

Returns:

best_hparams – Best hyperparameters found during optimization.

Return type:

list

predict(X, embeddings=None, device=None)

Predicts target labels for the given input samples.

Parameters:

X (DataFrame or array-like, shape (n_samples, n_features)) – The input samples for which to predict target values.

Returns:

predictions – The predicted class labels.

Return type:

ndarray, shape (n_samples,)

predict_proba(X, embeddings=None, device=None)

Predicts class probabilities for the given input samples.

Parameters:

X (DataFrame or array-like, shape (n_samples, n_features)) – The input samples for which to predict class probabilities.

Returns:

probabilities – The predicted class probabilities.

Return type:

ndarray, shape (n_samples, n_classes)

pretrain(pretrain_epochs=15, k_neighbors=10, temperature=0.1, save_path='pretrained_embeddings.pth', lr=0.001, use_positive=True, use_negative=False, pool_sequence=True)

Pretrains the embedding layer of the model using a contrastive learning approach.

This method performs pretraining by optimizing the embeddings with respect to neighborhood structure in the feature space. The embeddings are saved after training.

Parameters:
  • pretrain_epochs (int, default=15) – Number of epochs to run pretraining.

  • k_neighbors (int, default=10) – Number of neighbors used in the contrastive loss computation.

  • temperature (float, default=0.1) – Temperature parameter for contrastive loss scaling.

  • save_path (str, default="pretrained_embeddings.pth") – Path to save the pretrained embeddings.

  • lr (float, default=1e-3) – Learning rate for the pretraining optimizer.

  • use_positive (bool, default=True) – Whether to include positive pairs in contrastive learning.

  • use_negative (bool, default=False) – Whether to include negative pairs in contrastive learning.

  • pool_sequence (bool, default=True) – Whether to apply sequence pooling before computing contrastive loss.

Raises:
  • ValueError – If the model has not been built before calling this method.

  • ValueError – If the model does not contain an embedding layer.

Notes

  • This function requires that self.build_model() has been called beforehand.

  • The pretraining method uses self.task_model.estimator.embedding_layer.

  • The method invokes super()._pretrain() with regression mode enabled.

save(path)

Save the fitted model to path.

The bundle written by this method can be restored with load(). It contains all state required for inference: the config, the fitted preprocessor, feature metadata, and the neural-network weights.

Parameters:

path (str) – Destination file path (e.g. "model.pt").

Raises:

ValueError – If the model has not been fitted yet.

Return type:

None

score(X, y, embeddings=None, metric=(sklearn.metrics.log_loss, True))

Calculate the score of the model using the specified metric.

Parameters:
  • X (array-like or pd.DataFrame of shape (n_samples, n_features)) – The input samples to predict.

  • y (array-like of shape (n_samples,)) – The true class labels against which to evaluate the predictions.

  • metric (tuple, default=(log_loss, True)) – A tuple containing the metric function and a boolean indicating whether the metric requires probability scores (True) or class labels (False).

Returns:

score – The score calculated using the specified metric.

Return type:

float

set_params(**parameters)

Set the parameters of this estimator.

class deeptab.models.experimental.TangosRegressor(*args: Any, **kwargs: Any)[source]

Tangos regressor. This class extends the SklearnBaseRegressor class and uses the Tangos model with the default Tangos configuration.

Notes

The parameters for this class include the attributes from the config dataclass as well as preprocessing arguments handled by the base class.

Configuration class for the default Multi-Layer Perceptron (TANGOS) model with predefined hyperparameters.

Parameters:
  • layer_sizes (list, default=(256, 128, 32)) – Sizes of the layers in the TANGOS.

  • activation (callable, default=nn.ReLU()) – Activation function for the TANGOS layers.

  • skip_layers (bool, default=False) – Whether to skip layers in the TANGOS.

  • dropout (float, default=0.2) – Dropout rate for regularization.

  • use_glu (bool, default=False) – Whether to use Gated Linear Units (GLU) in the TANGOS.

  • skip_connections (bool, default=False) – Whether to use skip connections in the TANGOS.

  • feature_preprocessing (dict, optional) – Dictionary mapping feature names to specific preprocessing methods. Overrides global defaults.

  • n_bins (int, default=64) – Number of bins used for binning-based preprocessing (e.g., for discretizers or PLE).

  • numerical_preprocessing (str, default="ple") – Preprocessing method for numerical features (e.g., “standardization”, “minmax”, “ple”, “rbf”, etc.).

  • categorical_preprocessing (str, default="int") – Preprocessing method for categorical features (e.g., “int”, “ordinal”, “onehot”).

  • use_decision_tree_bins (bool, default=False) – Whether to use decision tree binning for numerical discretization.

  • binning_strategy (str, default="uniform") – Strategy for bin placement when not using tree-based methods. Options: “uniform”, “quantile”.

  • task (str, default="regression") – Problem type used to guide preprocessing (e.g., “regression” or “classification”).

  • cat_cutoff (float or int, default=0.03) – Threshold to determine whether integer-valued features are treated as categorical.

  • treat_all_integers_as_numerical (bool, default=False) – If True, treat all integer-typed columns as numerical regardless of cardinality.

  • degree (int, default=3) – Degree of polynomial or spline basis functions where applicable.

  • scaling_strategy (str, default="minmax") – Strategy for feature scaling (e.g., “standardization”, “minmax”, etc.).

  • n_knots (int, default=64) – Number of knots used in spline-based feature expansions.

  • use_decision_tree_knots (bool, default=True) – Whether to use decision tree-based knot placement for spline transformations.

  • knots_strategy (str, default="uniform") – Strategy for placing knots for splines (“uniform” or “quantile”).

  • spline_implementation (str, default="sklearn") – Which spline backend implementation to use (e.g., “sklearn”, “custom”).

  • min_unique_vals (int, default=5) – Minimum number of unique values required for a feature to be treated as numerical.

Examples

>>> from deeptab.models.experimental import TangosRegressor
>>> model = TangosRegressor(d_model=64, n_layers=8)
>>> model.fit(X_train, y_train)
>>> preds = model.predict(X_test)
>>> model.evaluate(X_test, y_test)
build_model(X, y, val_size=0.2, X_val=None, y_val=None, embeddings=None, embeddings_val=None, random_state=101, batch_size=128, shuffle=True, lr=None, lr_patience=None, lr_factor=None, weight_decay=None, train_metrics=None, val_metrics=None, dataloader_kwargs={})

Builds the model using the provided training data.

Parameters:
  • X (DataFrame or array-like, shape (n_samples, n_features)) – The training input samples.

  • y (array-like, shape (n_samples,) or (n_samples, n_targets)) – The target values (real numbers).

  • val_size (float) – The proportion of the dataset to include in the validation split if X_val is None. Ignored if X_val is provided.

  • X_val (DataFrame or array-like, shape (n_samples, n_features), optional) – The validation input samples. If provided, X and y are not split and this data is used for validation.

  • y_val (array-like, shape (n_samples,) or (n_samples, n_targets), optional) – The validation target values. Required if X_val is provided.

  • random_state (int) – Controls the shuffling applied to the data before applying the split.

  • batch_size (int) – Number of samples per gradient update.

  • shuffle (bool) – Whether to shuffle the training data before each epoch.

  • lr (float | None) – Learning rate for the optimizer.

  • lr_patience (int | None) – Number of epochs with no improvement on the validation loss to wait before reducing the learning rate.

  • factor (float, default=0.1) – Factor by which the learning rate will be reduced.

  • weight_decay (float | None) – Weight decay (L2 penalty) coefficient.

  • train_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during training.

  • val_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during validation.

  • dataloader_kwargs (dict, default={}) – The kwargs for the pytorch dataloader class.

Returns:

self – The built regressor.

Return type:

object

encode(X, embeddings=None, batch_size=64)

Encodes input data using the trained model’s embedding layer.

Parameters:
  • X (array-like or DataFrame) – Input data to be encoded.

  • batch_size (int, optional, default=64) – Batch size for encoding.

Returns:

Encoded representations of the input data.

Return type:

torch.Tensor

Raises:

ValueError – If the model or data module is not fitted.

evaluate(X, y_true, embeddings=None, metrics=None)

Evaluate the model on the given data using specified metrics.

Parameters:
  • X (array-like or pd.DataFrame of shape (n_samples, n_features)) – The input samples to predict.

  • y_true (array-like of shape (n_samples,) or (n_samples, n_outputs)) – The true target values against which to evaluate the predictions.

  • metrics (dict) – A dictionary where keys are metric names and values are the metric functions.

Notes

This method uses the predict method to generate predictions and computes each metric.

Returns:

scores – A dictionary with metric names as keys and their corresponding scores as values.

Return type:

dict

fit(X, y, val_size=0.2, X_val=None, y_val=None, embeddings=None, embeddings_val=None, max_epochs=100, random_state=101, batch_size=128, shuffle=True, patience=15, monitor='val_loss', mode='min', lr=None, lr_patience=None, lr_factor=None, weight_decay=None, checkpoint_path='model_checkpoints', dataloader_kwargs={}, train_metrics=None, val_metrics=None, rebuild=True, **trainer_kwargs)

Trains the regression model using the provided training data. Optionally, a separate validation set can be used.

Parameters:
  • X (DataFrame or array-like, shape (n_samples, n_features)) – The training input samples.

  • y (array-like, shape (n_samples,) or (n_samples, n_targets)) – The target values (real numbers).

  • val_size (float) – The proportion of the dataset to include in the validation split if X_val is None. Ignored if X_val is provided.

  • X_val (DataFrame or array-like, shape (n_samples, n_features), optional) – The validation input samples. If provided, X and y are not split and this data is used for validation.

  • y_val (array-like, shape (n_samples,) or (n_samples, n_targets), optional) – The validation target values. Required if X_val is provided.

  • max_epochs (int) – Maximum number of epochs for training.

  • random_state (int) – Controls the shuffling applied to the data before applying the split.

  • batch_size (int) – Number of samples per gradient update.

  • shuffle (bool) – Whether to shuffle the training data before each epoch.

  • patience (int) – Number of epochs with no improvement on the validation loss to wait before early stopping.

  • monitor (str) – The metric to monitor for early stopping.

  • mode (str) – Whether the monitored metric should be minimized (min) or maximized (max).

  • lr (float | None) – Learning rate for the optimizer.

  • lr_patience (int | None) – Number of epochs with no improvement on the validation loss to wait before reducing the learning rate.

  • factor (float, default=0.1) – Factor by which the learning rate will be reduced.

  • weight_decay (float | None) – Weight decay (L2 penalty) coefficient.

  • checkpoint_path (str, default="model_checkpoints") – Path where the checkpoints are being saved.

  • dataloader_kwargs (dict, default={}) – The kwargs for the pytorch dataloader class.

  • train_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during training.

  • val_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during validation.

  • rebuild (bool, default=True) – Whether to rebuild the model when it already was built.

  • **trainer_kwargs (Additional keyword arguments for PyTorch Lightning's Trainer class.)

Returns:

self – The fitted regressor.

Return type:

object

get_number_of_params(requires_grad=True)

Calculate the number of parameters in the model.

Parameters:

requires_grad (bool, optional) – If True, only count the parameters that require gradients (trainable parameters). If False, count all parameters. Default is True.

Returns:

The total number of parameters in the model.

Return type:

int

Raises:

ValueError – If the model has not been built prior to calling this method.

get_params(deep=True)

Get parameters for this estimator.

classmethod load(path)

Load and return a fitted model from path.

Parameters:

path (str) – Path to a file previously written by save().

Returns:

A fully reconstructed, ready-to-predict estimator of the same type that was saved.

Return type:

estimator

optimize_hparams(X, y, X_val=None, y_val=None, embeddings=None, embeddings_val=None, time=100, max_epochs=200, prune_by_epoch=True, prune_epoch=5, fixed_params={'cat_encoding': 'int', 'head_layer_size_length': 0, 'head_skip_layer': False, 'head_skip_layers': False, 'pooling_method': 'avg', 'use_cls': False}, custom_search_space=None, **optimize_kwargs)

Optimizes hyperparameters using Bayesian optimization with optional pruning.

Parameters:
  • X (array-like) – Training data.

  • y (array-like) – Training labels.

  • X_val (array-like, optional) – Validation data and labels.

  • y_val (array-like, optional) – Validation data and labels.

  • time (int) – The number of optimization trials to run.

  • max_epochs (int) – Maximum number of epochs for training.

  • prune_by_epoch (bool) – Whether to prune based on a specific epoch (True) or the best validation loss (False).

  • prune_epoch (int) – The specific epoch to prune by when prune_by_epoch is True.

  • **optimize_kwargs (dict) – Additional keyword arguments passed to the fit method.

Returns:

best_hparams – Best hyperparameters found during optimization.

Return type:

list

predict(X, embeddings=None, device=None)

Predicts target values for the given input samples.

Parameters:

X (DataFrame or array-like, shape (n_samples, n_features)) – The input samples for which to predict target values.

Returns:

predictions – The predicted target values.

Return type:

ndarray, shape (n_samples,) or (n_samples, n_outputs)

pretrain(pretrain_epochs=15, k_neighbors=10, temperature=0.1, save_path='pretrained_embeddings.pth', lr=0.001, use_positive=True, use_negative=False, pool_sequence=True)

Pretrains the embedding layer of the model using a contrastive learning approach.

This method performs pretraining by optimizing the embeddings with respect to neighborhood structure in the feature space. The embeddings are saved after training.

Parameters:
  • pretrain_epochs (int, default=15) – Number of epochs to run pretraining.

  • k_neighbors (int, default=10) – Number of neighbors used in the contrastive loss computation.

  • temperature (float, default=0.1) – Temperature parameter for contrastive loss scaling.

  • save_path (str, default="pretrained_embeddings.pth") – Path to save the pretrained embeddings.

  • lr (float, default=1e-3) – Learning rate for the pretraining optimizer.

  • use_positive (bool, default=True) – Whether to include positive pairs in contrastive learning.

  • use_negative (bool, default=False) – Whether to include negative pairs in contrastive learning.

  • pool_sequence (bool, default=True) – Whether to apply sequence pooling before computing contrastive loss.

Raises:
  • ValueError – If the model has not been built before calling this method.

  • ValueError – If the model does not contain an embedding layer.

Notes

  • This function requires that self.build_model() has been called beforehand.

  • The pretraining method uses self.task_model.estimator.embedding_layer.

  • The method invokes super()._pretrain() with regression mode enabled.

save(path)

Save the fitted model to path.

The bundle written by this method can be restored with load(). It contains all state required for inference: the config, the fitted preprocessor, feature metadata, and the neural-network weights.

Parameters:

path (str) – Destination file path (e.g. "model.pt").

Raises:

ValueError – If the model has not been fitted yet.

Return type:

None

score(X, y, embeddings=None, metric=sklearn.metrics.mean_squared_error)

Calculate the score of the model using the specified metric.

Parameters:
  • X (array-like or pd.DataFrame of shape (n_samples, n_features)) – The input samples to predict.

  • y (array-like of shape (n_samples,) or (n_samples, n_outputs)) – The true target values against which to evaluate the predictions.

  • metric (callable, default=mean_squared_error) – The metric function to use for evaluation. Must be a callable with the signature metric(y_true, y_pred).

Returns:

score – The score calculated using the specified metric.

Return type:

float

set_params(**parameters)

Set the parameters of this estimator.

class deeptab.models.experimental.TangosLSS(*args: Any, **kwargs: Any)[source]

Tangos for distributional regression. This class extends the SklearnBaseLSS class and uses the Tangos model with the default Tangos configuration.

Notes

The parameters for this class include the attributes from the config dataclass as well as preprocessing arguments handled by the base class.

Configuration class for the default Multi-Layer Perceptron (TANGOS) model with predefined hyperparameters.

Parameters:
  • layer_sizes (list, default=(256, 128, 32)) – Sizes of the layers in the TANGOS.

  • activation (callable, default=nn.ReLU()) – Activation function for the TANGOS layers.

  • skip_layers (bool, default=False) – Whether to skip layers in the TANGOS.

  • dropout (float, default=0.2) – Dropout rate for regularization.

  • use_glu (bool, default=False) – Whether to use Gated Linear Units (GLU) in the TANGOS.

  • skip_connections (bool, default=False) – Whether to use skip connections in the TANGOS.

  • feature_preprocessing (dict, optional) – Dictionary mapping feature names to specific preprocessing methods. Overrides global defaults.

  • n_bins (int, default=64) – Number of bins used for binning-based preprocessing (e.g., for discretizers or PLE).

  • numerical_preprocessing (str, default="ple") – Preprocessing method for numerical features (e.g., “standardization”, “minmax”, “ple”, “rbf”, etc.).

  • categorical_preprocessing (str, default="int") – Preprocessing method for categorical features (e.g., “int”, “ordinal”, “onehot”).

  • use_decision_tree_bins (bool, default=False) – Whether to use decision tree binning for numerical discretization.

  • binning_strategy (str, default="uniform") – Strategy for bin placement when not using tree-based methods. Options: “uniform”, “quantile”.

  • task (str, default="regression") – Problem type used to guide preprocessing (e.g., “regression” or “classification”).

  • cat_cutoff (float or int, default=0.03) – Threshold to determine whether integer-valued features are treated as categorical.

  • treat_all_integers_as_numerical (bool, default=False) – If True, treat all integer-typed columns as numerical regardless of cardinality.

  • degree (int, default=3) – Degree of polynomial or spline basis functions where applicable.

  • scaling_strategy (str, default="minmax") – Strategy for feature scaling (e.g., “standardization”, “minmax”, etc.).

  • n_knots (int, default=64) – Number of knots used in spline-based feature expansions.

  • use_decision_tree_knots (bool, default=True) – Whether to use decision tree-based knot placement for spline transformations.

  • knots_strategy (str, default="uniform") – Strategy for placing knots for splines (“uniform” or “quantile”).

  • spline_implementation (str, default="sklearn") – Which spline backend implementation to use (e.g., “sklearn”, “custom”).

  • min_unique_vals (int, default=5) – Minimum number of unique values required for a feature to be treated as numerical.

Examples

>>> from deeptab.models.experimental import TangosLSS
>>> model = TangosLSS(d_model=64, n_layers=8)
>>> model.fit(X_train, y_train, family='normal')
>>> preds = model.predict(X_test)
>>> model.evaluate(X_test, y_test)
build_model(X, y, val_size=0.2, X_val=None, y_val=None, random_state=101, batch_size=128, shuffle=True, lr=None, lr_patience=None, lr_factor=None, weight_decay=None, train_metrics=None, val_metrics=None, dataloader_kwargs={})

Builds the model using the provided training data.

Parameters:
  • X (DataFrame or array-like, shape (n_samples, n_features)) – The training input samples.

  • y (array-like, shape (n_samples,) or (n_samples, n_targets)) – The target values (real numbers).

  • val_size (float) – The proportion of the dataset to include in the validation split if X_val is None. Ignored if X_val is provided.

  • X_val (DataFrame or array-like, shape (n_samples, n_features), optional) – The validation input samples. If provided, X and y are not split and this data is used for validation.

  • y_val (array-like, shape (n_samples,) or (n_samples, n_targets), optional) – The validation target values. Required if X_val is provided.

  • random_state (int) – Controls the shuffling applied to the data before applying the split.

  • batch_size (int) – Number of samples per gradient update.

  • shuffle (bool) – Whether to shuffle the training data before each epoch.

  • lr (float | None) – Learning rate for the optimizer.

  • lr_patience (int | None) – Number of epochs with no improvement on the validation loss to wait before reducing the learning rate.

  • lr_factor (float | None) – Factor by which the learning rate will be reduced.

  • train_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during training.

  • val_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during validation.

  • weight_decay (float | None) – Weight decay (L2 penalty) coefficient.

  • dataloader_kwargs (dict, default={}) – The kwargs for the pytorch dataloader class.

Returns:

self – The built distributional regressor.

Return type:

object

encode(X, batch_size=64)

Encodes input data using the trained model’s embedding layer.

Parameters:
  • X (array-like or DataFrame) – Input data to be encoded.

  • batch_size (int, optional, default=64) – Batch size for encoding.

Returns:

Encoded representations of the input data.

Return type:

torch.Tensor

Raises:

ValueError – If the model or data module is not fitted.

evaluate(X, y_true, metrics=None, distribution_family=None)

Evaluate the model on the given data using specified metrics.

Parameters:
  • X (array-like or pd.DataFrame of shape (n_samples, n_features)) – The input samples to predict.

  • y_true (array-like of shape (n_samples,)) – The true class labels against which to evaluate the predictions.

  • metrics (dict) – A dictionary where keys are metric names and values are tuples containing the metric function and a boolean indicating whether the metric requires probability scores (True) or class labels (False).

  • distribution_family (str, optional) – Specifies the distribution family the model is predicting for. If None, it will attempt to infer based on the model’s settings.

Returns:

scores – A dictionary with metric names as keys and their corresponding scores as values.

Return type:

dict

Notes

This method uses either the predict or predict_proba method depending on the metric requirements.

fit(X, y, family, val_size=0.2, X_val=None, y_val=None, max_epochs=100, random_state=101, batch_size=128, shuffle=True, patience=15, monitor='val_loss', mode='min', lr=None, lr_patience=None, lr_factor=None, weight_decay=None, checkpoint_path='model_checkpoints', distributional_kwargs=None, train_metrics=None, val_metrics=None, dataloader_kwargs={}, rebuild=True, **trainer_kwargs)

Trains the regression model using the provided training data. Optionally, a separate validation set can be used.

Parameters:
  • X (DataFrame or array-like, shape (n_samples, n_features)) – The training input samples.

  • y (array-like, shape (n_samples,) or (n_samples, n_targets)) – The target values (real numbers).

  • family (str) – The name of the distribution family to use for the loss function. Examples include ‘normal’ for regression tasks.

  • val_size (float) – The proportion of the dataset to include in the validation split if X_val is None. Ignored if X_val is provided.

  • X_val (DataFrame or array-like, shape (n_samples, n_features), optional) – The validation input samples. If provided, X and y are not split and this data is used for validation.

  • y_val (array-like, shape (n_samples,) or (n_samples, n_targets), optional) – The validation target values. Required if X_val is provided.

  • max_epochs (int) – Maximum number of epochs for training.

  • random_state (int) – Controls the shuffling applied to the data before applying the split.

  • batch_size (int) – Number of samples per gradient update.

  • shuffle (bool) – Whether to shuffle the training data before each epoch.

  • patience (int) – Number of epochs with no improvement on the validation loss to wait before early stopping.

  • monitor (str) – The metric to monitor for early stopping.

  • mode (str) – Whether the monitored metric should be minimized (min) or maximized (max).

  • lr (float | None) – Learning rate for the optimizer.

  • lr_patience (int | None) – Number of epochs with no improvement on the validation loss to wait before reducing the learning rate.

  • factor (float, default=0.1) – Factor by which the learning rate will be reduced.

  • weight_decay (float | None) – Weight decay (L2 penalty) coefficient.

  • distributional_kwargs (dict, default=None) – any arguments taht are specific for a certain distribution.

  • train_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during training.

  • val_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during validation.

  • checkpoint_path (str, default="model_checkpoints") – Path where the checkpoints are being saved.

  • dataloader_kwargs (dict, default={}) – The kwargs for the pytorch dataloader class.

  • **trainer_kwargs (Additional keyword arguments for PyTorch Lightning's Trainer class.)

Returns:

self – The fitted regressor.

Return type:

object

get_default_metrics(distribution_family)

Provides default metrics based on the distribution family.

Parameters:

distribution_family (str) – The distribution family for which to provide default metrics.

Returns:

metrics – A dictionary of default metric functions.

Return type:

dict

get_number_of_params(requires_grad=True)

Calculate the number of parameters in the model.

Parameters:

requires_grad (bool, optional) – If True, only count the parameters that require gradients (trainable parameters). If False, count all parameters. Default is True.

Returns:

The total number of parameters in the model.

Return type:

int

Raises:

ValueError – If the model has not been built prior to calling this method.

get_params(deep=True)

Get parameters for this estimator.

Parameters:

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

params – Parameter names mapped to their values.

Return type:

dict

classmethod load(path)

Load and return a fitted model from path.

Parameters:

path (str) – Path to a file previously written by save().

Returns:

A fully reconstructed, ready-to-predict estimator.

Return type:

estimator

optimize_hparams(X, y, X_val=None, y_val=None, time=100, max_epochs=200, prune_by_epoch=True, prune_epoch=5, fixed_params={'cat_encoding': 'int', 'head_layer_size_length': 0, 'head_skip_layer': False, 'head_skip_layers': False, 'pooling_method': 'avg', 'use_cls': False}, custom_search_space=None, **optimize_kwargs)

Optimizes hyperparameters using Bayesian optimization with optional pruning.

Parameters:
  • X (array-like) – Training data.

  • y (array-like) – Training labels.

  • X_val (array-like, optional) – Validation data and labels.

  • y_val (array-like, optional) – Validation data and labels.

  • time (int) – The number of optimization trials to run.

  • max_epochs (int) – Maximum number of epochs for training.

  • prune_by_epoch (bool) – Whether to prune based on a specific epoch (True) or the best validation loss (False).

  • prune_epoch (int) – The specific epoch to prune by when prune_by_epoch is True.

  • **optimize_kwargs (dict) – Additional keyword arguments passed to the fit method.

Returns:

best_hparams – Best hyperparameters found during optimization.

Return type:

list

predict(X, raw=False, device=None)

Predicts target values for the given input samples.

Parameters:

X (DataFrame or array-like, shape (n_samples, n_features)) – The input samples for which to predict target values.

Returns:

predictions – The predicted target values.

Return type:

ndarray, shape (n_samples,) or (n_samples, n_outputs)

save(path)

Save the fitted model to path.

Parameters:

path (str) – Destination file path (e.g. "model.pt").

Raises:

ValueError – If the model has not been fitted yet.

Return type:

None

score(X, y, metric='NLL')

Calculate the score of the model using the specified metric.

Parameters:
  • X (array-like or pd.DataFrame of shape (n_samples, n_features)) – The input samples to predict.

  • y (array-like of shape (n_samples,) or (n_samples, n_outputs)) – The true target values against which to evaluate the predictions.

  • metric (str, default="NLL") – So far, only negative log-likelihood is supported

Returns:

score – The score calculated using the specified metric.

Return type:

float

set_params(**parameters)

Set the parameters of this estimator.

Parameters:

**parameters (dict) – Estimator parameters.

Returns:

self – Estimator instance.

Return type:

object

class deeptab.models.experimental.TromptClassifier(*args: Any, **kwargs: Any)[source]
Trompt Classifier. This class extends the SklearnBaseClassifier class

and uses the Trompt model with the default Trompt configuration.

Notes

The parameters for this class include the attributes from the config dataclass as well as preprocessing arguments handled by the base class.

Configuration class for the Trompt model with predefined hyperparameters.

Parameters:
  • d_model (int, default=128) – Dimensionality of the transformer model.

  • n_cycles (int, default=6) – Number of cycles in the Trompt model.

  • n_cells (int, default=4) – Number of cells in each cycle.

  • P (int, default=128) – Number of steps in the Trompt model.

  • feature_preprocessing (dict, optional) – Dictionary mapping feature names to specific preprocessing methods. Overrides global defaults.

  • n_bins (int, default=64) – Number of bins used for binning-based preprocessing (e.g., for discretizers or PLE).

  • numerical_preprocessing (str, default="ple") – Preprocessing method for numerical features (e.g., “standardization”, “minmax”, “ple”, “rbf”, etc.).

  • categorical_preprocessing (str, default="int") – Preprocessing method for categorical features (e.g., “int”, “ordinal”, “onehot”).

  • use_decision_tree_bins (bool, default=False) – Whether to use decision tree binning for numerical discretization.

  • binning_strategy (str, default="uniform") – Strategy for bin placement when not using tree-based methods. Options: “uniform”, “quantile”.

  • task (str, default="regression") – Problem type used to guide preprocessing (e.g., “regression” or “classification”).

  • cat_cutoff (float or int, default=0.03) – Threshold to determine whether integer-valued features are treated as categorical.

  • treat_all_integers_as_numerical (bool, default=False) – If True, treat all integer-typed columns as numerical regardless of cardinality.

  • degree (int, default=3) – Degree of polynomial or spline basis functions where applicable.

  • scaling_strategy (str, default="minmax") – Strategy for feature scaling (e.g., “standardization”, “minmax”, etc.).

  • n_knots (int, default=64) – Number of knots used in spline-based feature expansions.

  • use_decision_tree_knots (bool, default=True) – Whether to use decision tree-based knot placement for spline transformations.

  • knots_strategy (str, default="uniform") – Strategy for placing knots for splines (“uniform” or “quantile”).

  • spline_implementation (str, default="sklearn") – Which spline backend implementation to use (e.g., “sklearn”, “custom”).

  • min_unique_vals (int, default=5) – Minimum number of unique values required for a feature to be treated as numerical.

Examples

>>> from deeptab.models.experimental import TromptClassifier
>>> model = TromptClassifier(d_model=64, n_layers=8)
>>> model.fit(X_train, y_train)
>>> preds = model.predict(X_test)
>>> model.evaluate(X_test, y_test)
build_model(X, y, val_size=0.2, X_val=None, y_val=None, embeddings=None, embeddings_val=None, random_state=101, batch_size=128, shuffle=True, lr=None, lr_patience=None, lr_factor=None, weight_decay=None, train_metrics=None, val_metrics=None, dataloader_kwargs={})

Builds the model using the provided training data.

Parameters:
  • X (DataFrame or array-like, shape (n_samples, n_features)) – The training input samples.

  • y (array-like, shape (n_samples,) or (n_samples, n_targets)) – The target values (real numbers).

  • val_size (float) – The proportion of the dataset to include in the validation split if X_val is None. Ignored if X_val is provided.

  • X_val (DataFrame or array-like, shape (n_samples, n_features), optional) – The validation input samples. If provided, X and y are not split and this data is used for validation.

  • y_val (array-like, shape (n_samples,) or (n_samples, n_targets), optional) – The validation target values. Required if X_val is provided.

  • random_state (int) – Controls the shuffling applied to the data before applying the split.

  • batch_size (int) – Number of samples per gradient update.

  • shuffle (bool) – Whether to shuffle the training data before each epoch.

  • lr (float | None) – Learning rate for the optimizer.

  • lr_patience (int | None) – Number of epochs with no improvement on the validation loss to wait before reducing the learning rate.

  • lr_factor (float | None) – Factor by which the learning rate will be reduced.

  • train_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during training.

  • val_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during validation.

  • weight_decay (float | None) – Weight decay (L2 penalty) coefficient.

  • dataloader_kwargs (dict, default={}) – The kwargs for the pytorch dataloader class.

Returns:

self – The built classifier.

Return type:

object

encode(X, embeddings=None, batch_size=64)

Encodes input data using the trained model’s embedding layer.

Parameters:
  • X (array-like or DataFrame) – Input data to be encoded.

  • batch_size (int, optional, default=64) – Batch size for encoding.

Returns:

Encoded representations of the input data.

Return type:

torch.Tensor

Raises:

ValueError – If the model or data module is not fitted.

evaluate(X, y_true, embeddings=None, metrics=None)

Evaluate the model on the given data using specified metrics.

Parameters:
  • X (array-like or pd.DataFrame of shape (n_samples, n_features)) – The input samples to predict.

  • y_true (array-like of shape (n_samples,)) – The true class labels against which to evaluate the predictions.

  • embneddings (array-like or list of shape(n_samples, dimension)) – List or array with embeddings for unstructured data inputs

  • metrics (dict) – A dictionary where keys are metric names and values are tuples containing the metric function and a boolean indicating whether the metric requires probability scores (True) or class labels (False).

Returns:

scores – A dictionary with metric names as keys and their corresponding scores as values.

Return type:

dict

Notes

This method uses either the predict or predict_proba method depending on the metric requirements.

fit(X, y, val_size=0.2, X_val=None, y_val=None, embeddings=None, embeddings_val=None, max_epochs=100, random_state=101, batch_size=128, shuffle=True, patience=15, monitor='val_loss', mode='min', lr=None, lr_patience=None, lr_factor=None, weight_decay=None, checkpoint_path='model_checkpoints', train_metrics=None, val_metrics=None, dataloader_kwargs={}, rebuild=True, **trainer_kwargs)

Trains the classification model using the provided training data. Optionally, a separate validation set can be used.

Parameters:
  • X (DataFrame or array-like, shape (n_samples, n_features)) – The training input samples.

  • y (array-like, shape (n_samples,) or (n_samples, n_targets)) – The target values (real numbers).

  • val_size (float) – The proportion of the dataset to include in the validation split if X_val is None. Ignored if X_val is provided.

  • X_val (DataFrame or array-like, shape (n_samples, n_features), optional) – The validation input samples. If provided, X and y are not split and this data is used for validation.

  • y_val (array-like, shape (n_samples,) or (n_samples, n_targets), optional) – The validation target values. Required if X_val is provided.

  • max_epochs (int) – Maximum number of epochs for training.

  • random_state (int) – Controls the shuffling applied to the data before applying the split.

  • batch_size (int) – Number of samples per gradient update.

  • shuffle (bool) – Whether to shuffle the training data before each epoch.

  • patience (int) – Number of epochs with no improvement on the validation loss to wait before early stopping.

  • monitor (str) – The metric to monitor for early stopping.

  • mode (str) – Whether the monitored metric should be minimized (min) or maximized (max).

  • lr (float | None) – Learning rate for the optimizer.

  • lr_patience (int | None) – Number of epochs with no improvement on the validation loss to wait before reducing the learning rate.

  • factor (float, default=0.1) – Factor by which the learning rate will be reduced.

  • weight_decay (float | None) – Weight decay (L2 penalty) coefficient.

  • checkpoint_path (str, default="model_checkpoints") – Path where the checkpoints are being saved.

  • train_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during training.

  • val_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during validation.

  • dataloader_kwargs (dict, default={}) – The kwargs for the pytorch dataloader class.

  • rebuild (bool, default=True) – Whether to rebuild the model when it already was built.

  • **trainer_kwargs (Additional keyword arguments for PyTorch Lightning's Trainer class.)

Returns:

self – The fitted classifier.

Return type:

object

get_number_of_params(requires_grad=True)

Calculate the number of parameters in the model.

Parameters:

requires_grad (bool, optional) – If True, only count the parameters that require gradients (trainable parameters). If False, count all parameters. Default is True.

Returns:

The total number of parameters in the model.

Return type:

int

Raises:

ValueError – If the model has not been built prior to calling this method.

get_params(deep=True)

Get parameters for this estimator.

classmethod load(path)

Load and return a fitted model from path.

Parameters:

path (str) – Path to a file previously written by save().

Returns:

A fully reconstructed, ready-to-predict estimator of the same type that was saved.

Return type:

estimator

optimize_hparams(X, y, X_val=None, y_val=None, embeddings=None, embeddings_val=None, time=100, max_epochs=200, prune_by_epoch=True, prune_epoch=5, fixed_params={'cat_encoding': 'int', 'head_layer_size_length': 0, 'head_skip_layer': False, 'head_skip_layers': False, 'pooling_method': 'avg', 'use_cls': False}, custom_search_space=None, **optimize_kwargs)

Optimizes hyperparameters using Bayesian optimization with optional pruning.

Parameters:
  • X (array-like) – Training data.

  • y (array-like) – Training labels.

  • X_val (array-like, optional) – Validation data and labels.

  • y_val (array-like, optional) – Validation data and labels.

  • time (int) – The number of optimization trials to run.

  • max_epochs (int) – Maximum number of epochs for training.

  • prune_by_epoch (bool) – Whether to prune based on a specific epoch (True) or the best validation loss (False).

  • prune_epoch (int) – The specific epoch to prune by when prune_by_epoch is True.

  • **optimize_kwargs (dict) – Additional keyword arguments passed to the fit method.

Returns:

best_hparams – Best hyperparameters found during optimization.

Return type:

list

predict(X, embeddings=None, device=None)

Predicts target labels for the given input samples.

Parameters:

X (DataFrame or array-like, shape (n_samples, n_features)) – The input samples for which to predict target values.

Returns:

predictions – The predicted class labels.

Return type:

ndarray, shape (n_samples,)

predict_proba(X, embeddings=None, device=None)

Predicts class probabilities for the given input samples.

Parameters:

X (DataFrame or array-like, shape (n_samples, n_features)) – The input samples for which to predict class probabilities.

Returns:

probabilities – The predicted class probabilities.

Return type:

ndarray, shape (n_samples, n_classes)

pretrain(pretrain_epochs=15, k_neighbors=10, temperature=0.1, save_path='pretrained_embeddings.pth', lr=0.001, use_positive=True, use_negative=False, pool_sequence=True)

Pretrains the embedding layer of the model using a contrastive learning approach.

This method performs pretraining by optimizing the embeddings with respect to neighborhood structure in the feature space. The embeddings are saved after training.

Parameters:
  • pretrain_epochs (int, default=15) – Number of epochs to run pretraining.

  • k_neighbors (int, default=10) – Number of neighbors used in the contrastive loss computation.

  • temperature (float, default=0.1) – Temperature parameter for contrastive loss scaling.

  • save_path (str, default="pretrained_embeddings.pth") – Path to save the pretrained embeddings.

  • lr (float, default=1e-3) – Learning rate for the pretraining optimizer.

  • use_positive (bool, default=True) – Whether to include positive pairs in contrastive learning.

  • use_negative (bool, default=False) – Whether to include negative pairs in contrastive learning.

  • pool_sequence (bool, default=True) – Whether to apply sequence pooling before computing contrastive loss.

Raises:
  • ValueError – If the model has not been built before calling this method.

  • ValueError – If the model does not contain an embedding layer.

Notes

  • This function requires that self.build_model() has been called beforehand.

  • The pretraining method uses self.task_model.estimator.embedding_layer.

  • The method invokes super()._pretrain() with regression mode enabled.

save(path)

Save the fitted model to path.

The bundle written by this method can be restored with load(). It contains all state required for inference: the config, the fitted preprocessor, feature metadata, and the neural-network weights.

Parameters:

path (str) – Destination file path (e.g. "model.pt").

Raises:

ValueError – If the model has not been fitted yet.

Return type:

None

score(X, y, embeddings=None, metric=(sklearn.metrics.log_loss, True))

Calculate the score of the model using the specified metric.

Parameters:
  • X (array-like or pd.DataFrame of shape (n_samples, n_features)) – The input samples to predict.

  • y (array-like of shape (n_samples,)) – The true class labels against which to evaluate the predictions.

  • metric (tuple, default=(log_loss, True)) – A tuple containing the metric function and a boolean indicating whether the metric requires probability scores (True) or class labels (False).

Returns:

score – The score calculated using the specified metric.

Return type:

float

set_params(**parameters)

Set the parameters of this estimator.

class deeptab.models.experimental.TromptRegressor(*args: Any, **kwargs: Any)[source]

Trompt regressor. This class extends the SklearnBaseRegressor class and uses the Trompt model with the default Trompt configuration.

Notes

The parameters for this class include the attributes from the config dataclass as well as preprocessing arguments handled by the base class.

Configuration class for the Trompt model with predefined hyperparameters.

Parameters:
  • d_model (int, default=128) – Dimensionality of the transformer model.

  • n_cycles (int, default=6) – Number of cycles in the Trompt model.

  • n_cells (int, default=4) – Number of cells in each cycle.

  • P (int, default=128) – Number of steps in the Trompt model.

  • feature_preprocessing (dict, optional) – Dictionary mapping feature names to specific preprocessing methods. Overrides global defaults.

  • n_bins (int, default=64) – Number of bins used for binning-based preprocessing (e.g., for discretizers or PLE).

  • numerical_preprocessing (str, default="ple") – Preprocessing method for numerical features (e.g., “standardization”, “minmax”, “ple”, “rbf”, etc.).

  • categorical_preprocessing (str, default="int") – Preprocessing method for categorical features (e.g., “int”, “ordinal”, “onehot”).

  • use_decision_tree_bins (bool, default=False) – Whether to use decision tree binning for numerical discretization.

  • binning_strategy (str, default="uniform") – Strategy for bin placement when not using tree-based methods. Options: “uniform”, “quantile”.

  • task (str, default="regression") – Problem type used to guide preprocessing (e.g., “regression” or “classification”).

  • cat_cutoff (float or int, default=0.03) – Threshold to determine whether integer-valued features are treated as categorical.

  • treat_all_integers_as_numerical (bool, default=False) – If True, treat all integer-typed columns as numerical regardless of cardinality.

  • degree (int, default=3) – Degree of polynomial or spline basis functions where applicable.

  • scaling_strategy (str, default="minmax") – Strategy for feature scaling (e.g., “standardization”, “minmax”, etc.).

  • n_knots (int, default=64) – Number of knots used in spline-based feature expansions.

  • use_decision_tree_knots (bool, default=True) – Whether to use decision tree-based knot placement for spline transformations.

  • knots_strategy (str, default="uniform") – Strategy for placing knots for splines (“uniform” or “quantile”).

  • spline_implementation (str, default="sklearn") – Which spline backend implementation to use (e.g., “sklearn”, “custom”).

  • min_unique_vals (int, default=5) – Minimum number of unique values required for a feature to be treated as numerical.

Examples

>>> from deeptab.models.experimental import TromptRegressor
>>> model = TromptRegressor(d_model=64, n_layers=8)
>>> model.fit(X_train, y_train)
>>> preds = model.predict(X_test)
>>> model.evaluate(X_test, y_test)
build_model(X, y, val_size=0.2, X_val=None, y_val=None, embeddings=None, embeddings_val=None, random_state=101, batch_size=128, shuffle=True, lr=None, lr_patience=None, lr_factor=None, weight_decay=None, train_metrics=None, val_metrics=None, dataloader_kwargs={})

Builds the model using the provided training data.

Parameters:
  • X (DataFrame or array-like, shape (n_samples, n_features)) – The training input samples.

  • y (array-like, shape (n_samples,) or (n_samples, n_targets)) – The target values (real numbers).

  • val_size (float) – The proportion of the dataset to include in the validation split if X_val is None. Ignored if X_val is provided.

  • X_val (DataFrame or array-like, shape (n_samples, n_features), optional) – The validation input samples. If provided, X and y are not split and this data is used for validation.

  • y_val (array-like, shape (n_samples,) or (n_samples, n_targets), optional) – The validation target values. Required if X_val is provided.

  • random_state (int) – Controls the shuffling applied to the data before applying the split.

  • batch_size (int) – Number of samples per gradient update.

  • shuffle (bool) – Whether to shuffle the training data before each epoch.

  • lr (float | None) – Learning rate for the optimizer.

  • lr_patience (int | None) – Number of epochs with no improvement on the validation loss to wait before reducing the learning rate.

  • factor (float, default=0.1) – Factor by which the learning rate will be reduced.

  • weight_decay (float | None) – Weight decay (L2 penalty) coefficient.

  • train_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during training.

  • val_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during validation.

  • dataloader_kwargs (dict, default={}) – The kwargs for the pytorch dataloader class.

Returns:

self – The built regressor.

Return type:

object

encode(X, embeddings=None, batch_size=64)

Encodes input data using the trained model’s embedding layer.

Parameters:
  • X (array-like or DataFrame) – Input data to be encoded.

  • batch_size (int, optional, default=64) – Batch size for encoding.

Returns:

Encoded representations of the input data.

Return type:

torch.Tensor

Raises:

ValueError – If the model or data module is not fitted.

evaluate(X, y_true, embeddings=None, metrics=None)

Evaluate the model on the given data using specified metrics.

Parameters:
  • X (array-like or pd.DataFrame of shape (n_samples, n_features)) – The input samples to predict.

  • y_true (array-like of shape (n_samples,) or (n_samples, n_outputs)) – The true target values against which to evaluate the predictions.

  • metrics (dict) – A dictionary where keys are metric names and values are the metric functions.

Notes

This method uses the predict method to generate predictions and computes each metric.

Returns:

scores – A dictionary with metric names as keys and their corresponding scores as values.

Return type:

dict

fit(X, y, val_size=0.2, X_val=None, y_val=None, embeddings=None, embeddings_val=None, max_epochs=100, random_state=101, batch_size=128, shuffle=True, patience=15, monitor='val_loss', mode='min', lr=None, lr_patience=None, lr_factor=None, weight_decay=None, checkpoint_path='model_checkpoints', dataloader_kwargs={}, train_metrics=None, val_metrics=None, rebuild=True, **trainer_kwargs)

Trains the regression model using the provided training data. Optionally, a separate validation set can be used.

Parameters:
  • X (DataFrame or array-like, shape (n_samples, n_features)) – The training input samples.

  • y (array-like, shape (n_samples,) or (n_samples, n_targets)) – The target values (real numbers).

  • val_size (float) – The proportion of the dataset to include in the validation split if X_val is None. Ignored if X_val is provided.

  • X_val (DataFrame or array-like, shape (n_samples, n_features), optional) – The validation input samples. If provided, X and y are not split and this data is used for validation.

  • y_val (array-like, shape (n_samples,) or (n_samples, n_targets), optional) – The validation target values. Required if X_val is provided.

  • max_epochs (int) – Maximum number of epochs for training.

  • random_state (int) – Controls the shuffling applied to the data before applying the split.

  • batch_size (int) – Number of samples per gradient update.

  • shuffle (bool) – Whether to shuffle the training data before each epoch.

  • patience (int) – Number of epochs with no improvement on the validation loss to wait before early stopping.

  • monitor (str) – The metric to monitor for early stopping.

  • mode (str) – Whether the monitored metric should be minimized (min) or maximized (max).

  • lr (float | None) – Learning rate for the optimizer.

  • lr_patience (int | None) – Number of epochs with no improvement on the validation loss to wait before reducing the learning rate.

  • factor (float, default=0.1) – Factor by which the learning rate will be reduced.

  • weight_decay (float | None) – Weight decay (L2 penalty) coefficient.

  • checkpoint_path (str, default="model_checkpoints") – Path where the checkpoints are being saved.

  • dataloader_kwargs (dict, default={}) – The kwargs for the pytorch dataloader class.

  • train_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during training.

  • val_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during validation.

  • rebuild (bool, default=True) – Whether to rebuild the model when it already was built.

  • **trainer_kwargs (Additional keyword arguments for PyTorch Lightning's Trainer class.)

Returns:

self – The fitted regressor.

Return type:

object

get_number_of_params(requires_grad=True)

Calculate the number of parameters in the model.

Parameters:

requires_grad (bool, optional) – If True, only count the parameters that require gradients (trainable parameters). If False, count all parameters. Default is True.

Returns:

The total number of parameters in the model.

Return type:

int

Raises:

ValueError – If the model has not been built prior to calling this method.

get_params(deep=True)

Get parameters for this estimator.

classmethod load(path)

Load and return a fitted model from path.

Parameters:

path (str) – Path to a file previously written by save().

Returns:

A fully reconstructed, ready-to-predict estimator of the same type that was saved.

Return type:

estimator

optimize_hparams(X, y, X_val=None, y_val=None, embeddings=None, embeddings_val=None, time=100, max_epochs=200, prune_by_epoch=True, prune_epoch=5, fixed_params={'cat_encoding': 'int', 'head_layer_size_length': 0, 'head_skip_layer': False, 'head_skip_layers': False, 'pooling_method': 'avg', 'use_cls': False}, custom_search_space=None, **optimize_kwargs)

Optimizes hyperparameters using Bayesian optimization with optional pruning.

Parameters:
  • X (array-like) – Training data.

  • y (array-like) – Training labels.

  • X_val (array-like, optional) – Validation data and labels.

  • y_val (array-like, optional) – Validation data and labels.

  • time (int) – The number of optimization trials to run.

  • max_epochs (int) – Maximum number of epochs for training.

  • prune_by_epoch (bool) – Whether to prune based on a specific epoch (True) or the best validation loss (False).

  • prune_epoch (int) – The specific epoch to prune by when prune_by_epoch is True.

  • **optimize_kwargs (dict) – Additional keyword arguments passed to the fit method.

Returns:

best_hparams – Best hyperparameters found during optimization.

Return type:

list

predict(X, embeddings=None, device=None)

Predicts target values for the given input samples.

Parameters:

X (DataFrame or array-like, shape (n_samples, n_features)) – The input samples for which to predict target values.

Returns:

predictions – The predicted target values.

Return type:

ndarray, shape (n_samples,) or (n_samples, n_outputs)

pretrain(pretrain_epochs=15, k_neighbors=10, temperature=0.1, save_path='pretrained_embeddings.pth', lr=0.001, use_positive=True, use_negative=False, pool_sequence=True)

Pretrains the embedding layer of the model using a contrastive learning approach.

This method performs pretraining by optimizing the embeddings with respect to neighborhood structure in the feature space. The embeddings are saved after training.

Parameters:
  • pretrain_epochs (int, default=15) – Number of epochs to run pretraining.

  • k_neighbors (int, default=10) – Number of neighbors used in the contrastive loss computation.

  • temperature (float, default=0.1) – Temperature parameter for contrastive loss scaling.

  • save_path (str, default="pretrained_embeddings.pth") – Path to save the pretrained embeddings.

  • lr (float, default=1e-3) – Learning rate for the pretraining optimizer.

  • use_positive (bool, default=True) – Whether to include positive pairs in contrastive learning.

  • use_negative (bool, default=False) – Whether to include negative pairs in contrastive learning.

  • pool_sequence (bool, default=True) – Whether to apply sequence pooling before computing contrastive loss.

Raises:
  • ValueError – If the model has not been built before calling this method.

  • ValueError – If the model does not contain an embedding layer.

Notes

  • This function requires that self.build_model() has been called beforehand.

  • The pretraining method uses self.task_model.estimator.embedding_layer.

  • The method invokes super()._pretrain() with regression mode enabled.

save(path)

Save the fitted model to path.

The bundle written by this method can be restored with load(). It contains all state required for inference: the config, the fitted preprocessor, feature metadata, and the neural-network weights.

Parameters:

path (str) – Destination file path (e.g. "model.pt").

Raises:

ValueError – If the model has not been fitted yet.

Return type:

None

score(X, y, embeddings=None, metric=sklearn.metrics.mean_squared_error)

Calculate the score of the model using the specified metric.

Parameters:
  • X (array-like or pd.DataFrame of shape (n_samples, n_features)) – The input samples to predict.

  • y (array-like of shape (n_samples,) or (n_samples, n_outputs)) – The true target values against which to evaluate the predictions.

  • metric (callable, default=mean_squared_error) – The metric function to use for evaluation. Must be a callable with the signature metric(y_true, y_pred).

Returns:

score – The score calculated using the specified metric.

Return type:

float

set_params(**parameters)

Set the parameters of this estimator.

class deeptab.models.experimental.TromptLSS(*args: Any, **kwargs: Any)[source]
Trompt for distributional regression.

This class extends the SklearnBaseLSS class and uses the Trompt model with the default Trompt configuration.

Notes

The parameters for this class include the attributes from the config dataclass as well as preprocessing arguments handled by the base class.

Configuration class for the Trompt model with predefined hyperparameters.

Parameters:
  • d_model (int, default=128) – Dimensionality of the transformer model.

  • n_cycles (int, default=6) – Number of cycles in the Trompt model.

  • n_cells (int, default=4) – Number of cells in each cycle.

  • P (int, default=128) – Number of steps in the Trompt model.

  • feature_preprocessing (dict, optional) – Dictionary mapping feature names to specific preprocessing methods. Overrides global defaults.

  • n_bins (int, default=64) – Number of bins used for binning-based preprocessing (e.g., for discretizers or PLE).

  • numerical_preprocessing (str, default="ple") – Preprocessing method for numerical features (e.g., “standardization”, “minmax”, “ple”, “rbf”, etc.).

  • categorical_preprocessing (str, default="int") – Preprocessing method for categorical features (e.g., “int”, “ordinal”, “onehot”).

  • use_decision_tree_bins (bool, default=False) – Whether to use decision tree binning for numerical discretization.

  • binning_strategy (str, default="uniform") – Strategy for bin placement when not using tree-based methods. Options: “uniform”, “quantile”.

  • task (str, default="regression") – Problem type used to guide preprocessing (e.g., “regression” or “classification”).

  • cat_cutoff (float or int, default=0.03) – Threshold to determine whether integer-valued features are treated as categorical.

  • treat_all_integers_as_numerical (bool, default=False) – If True, treat all integer-typed columns as numerical regardless of cardinality.

  • degree (int, default=3) – Degree of polynomial or spline basis functions where applicable.

  • scaling_strategy (str, default="minmax") – Strategy for feature scaling (e.g., “standardization”, “minmax”, etc.).

  • n_knots (int, default=64) – Number of knots used in spline-based feature expansions.

  • use_decision_tree_knots (bool, default=True) – Whether to use decision tree-based knot placement for spline transformations.

  • knots_strategy (str, default="uniform") – Strategy for placing knots for splines (“uniform” or “quantile”).

  • spline_implementation (str, default="sklearn") – Which spline backend implementation to use (e.g., “sklearn”, “custom”).

  • min_unique_vals (int, default=5) – Minimum number of unique values required for a feature to be treated as numerical.

Examples

>>> from deeptab.models.experimental import TromptLSS
>>> model = TromptLSS(d_model=64, n_layers=8)
>>> model.fit(X_train, y_train, family="normal")
>>> preds = model.predict(X_test)
>>> model.evaluate(X_test, y_test)
build_model(X, y, val_size=0.2, X_val=None, y_val=None, random_state=101, batch_size=128, shuffle=True, lr=None, lr_patience=None, lr_factor=None, weight_decay=None, train_metrics=None, val_metrics=None, dataloader_kwargs={})

Builds the model using the provided training data.

Parameters:
  • X (DataFrame or array-like, shape (n_samples, n_features)) – The training input samples.

  • y (array-like, shape (n_samples,) or (n_samples, n_targets)) – The target values (real numbers).

  • val_size (float) – The proportion of the dataset to include in the validation split if X_val is None. Ignored if X_val is provided.

  • X_val (DataFrame or array-like, shape (n_samples, n_features), optional) – The validation input samples. If provided, X and y are not split and this data is used for validation.

  • y_val (array-like, shape (n_samples,) or (n_samples, n_targets), optional) – The validation target values. Required if X_val is provided.

  • random_state (int) – Controls the shuffling applied to the data before applying the split.

  • batch_size (int) – Number of samples per gradient update.

  • shuffle (bool) – Whether to shuffle the training data before each epoch.

  • lr (float | None) – Learning rate for the optimizer.

  • lr_patience (int | None) – Number of epochs with no improvement on the validation loss to wait before reducing the learning rate.

  • lr_factor (float | None) – Factor by which the learning rate will be reduced.

  • train_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during training.

  • val_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during validation.

  • weight_decay (float | None) – Weight decay (L2 penalty) coefficient.

  • dataloader_kwargs (dict, default={}) – The kwargs for the pytorch dataloader class.

Returns:

self – The built distributional regressor.

Return type:

object

encode(X, batch_size=64)

Encodes input data using the trained model’s embedding layer.

Parameters:
  • X (array-like or DataFrame) – Input data to be encoded.

  • batch_size (int, optional, default=64) – Batch size for encoding.

Returns:

Encoded representations of the input data.

Return type:

torch.Tensor

Raises:

ValueError – If the model or data module is not fitted.

evaluate(X, y_true, metrics=None, distribution_family=None)

Evaluate the model on the given data using specified metrics.

Parameters:
  • X (array-like or pd.DataFrame of shape (n_samples, n_features)) – The input samples to predict.

  • y_true (array-like of shape (n_samples,)) – The true class labels against which to evaluate the predictions.

  • metrics (dict) – A dictionary where keys are metric names and values are tuples containing the metric function and a boolean indicating whether the metric requires probability scores (True) or class labels (False).

  • distribution_family (str, optional) – Specifies the distribution family the model is predicting for. If None, it will attempt to infer based on the model’s settings.

Returns:

scores – A dictionary with metric names as keys and their corresponding scores as values.

Return type:

dict

Notes

This method uses either the predict or predict_proba method depending on the metric requirements.

fit(X, y, family, val_size=0.2, X_val=None, y_val=None, max_epochs=100, random_state=101, batch_size=128, shuffle=True, patience=15, monitor='val_loss', mode='min', lr=None, lr_patience=None, lr_factor=None, weight_decay=None, checkpoint_path='model_checkpoints', distributional_kwargs=None, train_metrics=None, val_metrics=None, dataloader_kwargs={}, rebuild=True, **trainer_kwargs)

Trains the regression model using the provided training data. Optionally, a separate validation set can be used.

Parameters:
  • X (DataFrame or array-like, shape (n_samples, n_features)) – The training input samples.

  • y (array-like, shape (n_samples,) or (n_samples, n_targets)) – The target values (real numbers).

  • family (str) – The name of the distribution family to use for the loss function. Examples include ‘normal’ for regression tasks.

  • val_size (float) – The proportion of the dataset to include in the validation split if X_val is None. Ignored if X_val is provided.

  • X_val (DataFrame or array-like, shape (n_samples, n_features), optional) – The validation input samples. If provided, X and y are not split and this data is used for validation.

  • y_val (array-like, shape (n_samples,) or (n_samples, n_targets), optional) – The validation target values. Required if X_val is provided.

  • max_epochs (int) – Maximum number of epochs for training.

  • random_state (int) – Controls the shuffling applied to the data before applying the split.

  • batch_size (int) – Number of samples per gradient update.

  • shuffle (bool) – Whether to shuffle the training data before each epoch.

  • patience (int) – Number of epochs with no improvement on the validation loss to wait before early stopping.

  • monitor (str) – The metric to monitor for early stopping.

  • mode (str) – Whether the monitored metric should be minimized (min) or maximized (max).

  • lr (float | None) – Learning rate for the optimizer.

  • lr_patience (int | None) – Number of epochs with no improvement on the validation loss to wait before reducing the learning rate.

  • factor (float, default=0.1) – Factor by which the learning rate will be reduced.

  • weight_decay (float | None) – Weight decay (L2 penalty) coefficient.

  • distributional_kwargs (dict, default=None) – any arguments taht are specific for a certain distribution.

  • train_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during training.

  • val_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during validation.

  • checkpoint_path (str, default="model_checkpoints") – Path where the checkpoints are being saved.

  • dataloader_kwargs (dict, default={}) – The kwargs for the pytorch dataloader class.

  • **trainer_kwargs (Additional keyword arguments for PyTorch Lightning's Trainer class.)

Returns:

self – The fitted regressor.

Return type:

object

get_default_metrics(distribution_family)

Provides default metrics based on the distribution family.

Parameters:

distribution_family (str) – The distribution family for which to provide default metrics.

Returns:

metrics – A dictionary of default metric functions.

Return type:

dict

get_number_of_params(requires_grad=True)

Calculate the number of parameters in the model.

Parameters:

requires_grad (bool, optional) – If True, only count the parameters that require gradients (trainable parameters). If False, count all parameters. Default is True.

Returns:

The total number of parameters in the model.

Return type:

int

Raises:

ValueError – If the model has not been built prior to calling this method.

get_params(deep=True)

Get parameters for this estimator.

Parameters:

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

params – Parameter names mapped to their values.

Return type:

dict

classmethod load(path)

Load and return a fitted model from path.

Parameters:

path (str) – Path to a file previously written by save().

Returns:

A fully reconstructed, ready-to-predict estimator.

Return type:

estimator

optimize_hparams(X, y, X_val=None, y_val=None, time=100, max_epochs=200, prune_by_epoch=True, prune_epoch=5, fixed_params={'cat_encoding': 'int', 'head_layer_size_length': 0, 'head_skip_layer': False, 'head_skip_layers': False, 'pooling_method': 'avg', 'use_cls': False}, custom_search_space=None, **optimize_kwargs)

Optimizes hyperparameters using Bayesian optimization with optional pruning.

Parameters:
  • X (array-like) – Training data.

  • y (array-like) – Training labels.

  • X_val (array-like, optional) – Validation data and labels.

  • y_val (array-like, optional) – Validation data and labels.

  • time (int) – The number of optimization trials to run.

  • max_epochs (int) – Maximum number of epochs for training.

  • prune_by_epoch (bool) – Whether to prune based on a specific epoch (True) or the best validation loss (False).

  • prune_epoch (int) – The specific epoch to prune by when prune_by_epoch is True.

  • **optimize_kwargs (dict) – Additional keyword arguments passed to the fit method.

Returns:

best_hparams – Best hyperparameters found during optimization.

Return type:

list

predict(X, raw=False, device=None)

Predicts target values for the given input samples.

Parameters:

X (DataFrame or array-like, shape (n_samples, n_features)) – The input samples for which to predict target values.

Returns:

predictions – The predicted target values.

Return type:

ndarray, shape (n_samples,) or (n_samples, n_outputs)

save(path)

Save the fitted model to path.

Parameters:

path (str) – Destination file path (e.g. "model.pt").

Raises:

ValueError – If the model has not been fitted yet.

Return type:

None

score(X, y, metric='NLL')

Calculate the score of the model using the specified metric.

Parameters:
  • X (array-like or pd.DataFrame of shape (n_samples, n_features)) – The input samples to predict.

  • y (array-like of shape (n_samples,) or (n_samples, n_outputs)) – The true target values against which to evaluate the predictions.

  • metric (str, default="NLL") – So far, only negative log-likelihood is supported

Returns:

score – The score calculated using the specified metric.

Return type:

float

set_params(**parameters)

Set the parameters of this estimator.

Parameters:

**parameters (dict) – Estimator parameters.

Returns:

self – Estimator instance.

Return type:

object

class deeptab.models.SklearnBaseClassifier(*args: Any, **kwargs: Any)[source]
build_model(X, y, val_size=0.2, X_val=None, y_val=None, embeddings=None, embeddings_val=None, random_state=101, batch_size=128, shuffle=True, lr=None, lr_patience=None, lr_factor=None, weight_decay=None, train_metrics=None, val_metrics=None, dataloader_kwargs={})[source]

Builds the model using the provided training data.

Parameters:
  • X (DataFrame or array-like, shape (n_samples, n_features)) – The training input samples.

  • y (array-like, shape (n_samples,) or (n_samples, n_targets)) – The target values (real numbers).

  • val_size (float) – The proportion of the dataset to include in the validation split if X_val is None. Ignored if X_val is provided.

  • X_val (DataFrame or array-like, shape (n_samples, n_features), optional) – The validation input samples. If provided, X and y are not split and this data is used for validation.

  • y_val (array-like, shape (n_samples,) or (n_samples, n_targets), optional) – The validation target values. Required if X_val is provided.

  • random_state (int) – Controls the shuffling applied to the data before applying the split.

  • batch_size (int) – Number of samples per gradient update.

  • shuffle (bool) – Whether to shuffle the training data before each epoch.

  • lr (float | None) – Learning rate for the optimizer.

  • lr_patience (int | None) – Number of epochs with no improvement on the validation loss to wait before reducing the learning rate.

  • lr_factor (float | None) – Factor by which the learning rate will be reduced.

  • train_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during training.

  • val_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during validation.

  • weight_decay (float | None) – Weight decay (L2 penalty) coefficient.

  • dataloader_kwargs (dict, default={}) – The kwargs for the pytorch dataloader class.

Returns:

self – The built classifier.

Return type:

object

encode(X, embeddings=None, batch_size=64)

Encodes input data using the trained model’s embedding layer.

Parameters:
  • X (array-like or DataFrame) – Input data to be encoded.

  • batch_size (int, optional, default=64) – Batch size for encoding.

Returns:

Encoded representations of the input data.

Return type:

torch.Tensor

Raises:

ValueError – If the model or data module is not fitted.

evaluate(X, y_true, embeddings=None, metrics=None)[source]

Evaluate the model on the given data using specified metrics.

Parameters:
  • X (array-like or pd.DataFrame of shape (n_samples, n_features)) – The input samples to predict.

  • y_true (array-like of shape (n_samples,)) – The true class labels against which to evaluate the predictions.

  • embneddings (array-like or list of shape(n_samples, dimension)) – List or array with embeddings for unstructured data inputs

  • metrics (dict) – A dictionary where keys are metric names and values are tuples containing the metric function and a boolean indicating whether the metric requires probability scores (True) or class labels (False).

Returns:

scores – A dictionary with metric names as keys and their corresponding scores as values.

Return type:

dict

Notes

This method uses either the predict or predict_proba method depending on the metric requirements.

fit(X, y, val_size=0.2, X_val=None, y_val=None, embeddings=None, embeddings_val=None, max_epochs=100, random_state=101, batch_size=128, shuffle=True, patience=15, monitor='val_loss', mode='min', lr=None, lr_patience=None, lr_factor=None, weight_decay=None, checkpoint_path='model_checkpoints', train_metrics=None, val_metrics=None, dataloader_kwargs={}, rebuild=True, **trainer_kwargs)[source]

Trains the classification model using the provided training data. Optionally, a separate validation set can be used.

Parameters:
  • X (DataFrame or array-like, shape (n_samples, n_features)) – The training input samples.

  • y (array-like, shape (n_samples,) or (n_samples, n_targets)) – The target values (real numbers).

  • val_size (float) – The proportion of the dataset to include in the validation split if X_val is None. Ignored if X_val is provided.

  • X_val (DataFrame or array-like, shape (n_samples, n_features), optional) – The validation input samples. If provided, X and y are not split and this data is used for validation.

  • y_val (array-like, shape (n_samples,) or (n_samples, n_targets), optional) – The validation target values. Required if X_val is provided.

  • max_epochs (int) – Maximum number of epochs for training.

  • random_state (int) – Controls the shuffling applied to the data before applying the split.

  • batch_size (int) – Number of samples per gradient update.

  • shuffle (bool) – Whether to shuffle the training data before each epoch.

  • patience (int) – Number of epochs with no improvement on the validation loss to wait before early stopping.

  • monitor (str) – The metric to monitor for early stopping.

  • mode (str) – Whether the monitored metric should be minimized (min) or maximized (max).

  • lr (float | None) – Learning rate for the optimizer.

  • lr_patience (int | None) – Number of epochs with no improvement on the validation loss to wait before reducing the learning rate.

  • factor (float, default=0.1) – Factor by which the learning rate will be reduced.

  • weight_decay (float | None) – Weight decay (L2 penalty) coefficient.

  • checkpoint_path (str, default="model_checkpoints") – Path where the checkpoints are being saved.

  • train_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during training.

  • val_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during validation.

  • dataloader_kwargs (dict, default={}) – The kwargs for the pytorch dataloader class.

  • rebuild (bool, default=True) – Whether to rebuild the model when it already was built.

  • **trainer_kwargs (Additional keyword arguments for PyTorch Lightning's Trainer class.)

Returns:

self – The fitted classifier.

Return type:

object

get_number_of_params(requires_grad=True)

Calculate the number of parameters in the model.

Parameters:

requires_grad (bool, optional) – If True, only count the parameters that require gradients (trainable parameters). If False, count all parameters. Default is True.

Returns:

The total number of parameters in the model.

Return type:

int

Raises:

ValueError – If the model has not been built prior to calling this method.

get_params(deep=True)

Get parameters for this estimator.

classmethod load(path)

Load and return a fitted model from path.

Parameters:

path (str) – Path to a file previously written by save().

Returns:

A fully reconstructed, ready-to-predict estimator of the same type that was saved.

Return type:

estimator

optimize_hparams(X, y, X_val=None, y_val=None, embeddings=None, embeddings_val=None, time=100, max_epochs=200, prune_by_epoch=True, prune_epoch=5, fixed_params={'cat_encoding': 'int', 'head_layer_size_length': 0, 'head_skip_layer': False, 'head_skip_layers': False, 'pooling_method': 'avg', 'use_cls': False}, custom_search_space=None, **optimize_kwargs)[source]

Optimizes hyperparameters using Bayesian optimization with optional pruning.

Parameters:
  • X (array-like) – Training data.

  • y (array-like) – Training labels.

  • X_val (array-like, optional) – Validation data and labels.

  • y_val (array-like, optional) – Validation data and labels.

  • time (int) – The number of optimization trials to run.

  • max_epochs (int) – Maximum number of epochs for training.

  • prune_by_epoch (bool) – Whether to prune based on a specific epoch (True) or the best validation loss (False).

  • prune_epoch (int) – The specific epoch to prune by when prune_by_epoch is True.

  • **optimize_kwargs (dict) – Additional keyword arguments passed to the fit method.

Returns:

best_hparams – Best hyperparameters found during optimization.

Return type:

list

predict(X, embeddings=None, device=None)[source]

Predicts target labels for the given input samples.

Parameters:

X (DataFrame or array-like, shape (n_samples, n_features)) – The input samples for which to predict target values.

Returns:

predictions – The predicted class labels.

Return type:

ndarray, shape (n_samples,)

predict_proba(X, embeddings=None, device=None)[source]

Predicts class probabilities for the given input samples.

Parameters:

X (DataFrame or array-like, shape (n_samples, n_features)) – The input samples for which to predict class probabilities.

Returns:

probabilities – The predicted class probabilities.

Return type:

ndarray, shape (n_samples, n_classes)

pretrain(pretrain_epochs=15, k_neighbors=10, temperature=0.1, save_path='pretrained_embeddings.pth', lr=0.001, use_positive=True, use_negative=False, pool_sequence=True)[source]

Pretrains the embedding layer of the model using a contrastive learning approach.

This method performs pretraining by optimizing the embeddings with respect to neighborhood structure in the feature space. The embeddings are saved after training.

Parameters:
  • pretrain_epochs (int, default=15) – Number of epochs to run pretraining.

  • k_neighbors (int, default=10) – Number of neighbors used in the contrastive loss computation.

  • temperature (float, default=0.1) – Temperature parameter for contrastive loss scaling.

  • save_path (str, default="pretrained_embeddings.pth") – Path to save the pretrained embeddings.

  • lr (float, default=1e-3) – Learning rate for the pretraining optimizer.

  • use_positive (bool, default=True) – Whether to include positive pairs in contrastive learning.

  • use_negative (bool, default=False) – Whether to include negative pairs in contrastive learning.

  • pool_sequence (bool, default=True) – Whether to apply sequence pooling before computing contrastive loss.

Raises:
  • ValueError – If the model has not been built before calling this method.

  • ValueError – If the model does not contain an embedding layer.

Notes

  • This function requires that self.build_model() has been called beforehand.

  • The pretraining method uses self.task_model.estimator.embedding_layer.

  • The method invokes super()._pretrain() with regression mode enabled.

save(path)

Save the fitted model to path.

The bundle written by this method can be restored with load(). It contains all state required for inference: the config, the fitted preprocessor, feature metadata, and the neural-network weights.

Parameters:

path (str) – Destination file path (e.g. "model.pt").

Raises:

ValueError – If the model has not been fitted yet.

Return type:

None

score(X, y, embeddings=None, metric=(sklearn.metrics.log_loss, True))[source]

Calculate the score of the model using the specified metric.

Parameters:
  • X (array-like or pd.DataFrame of shape (n_samples, n_features)) – The input samples to predict.

  • y (array-like of shape (n_samples,)) – The true class labels against which to evaluate the predictions.

  • metric (tuple, default=(log_loss, True)) – A tuple containing the metric function and a boolean indicating whether the metric requires probability scores (True) or class labels (False).

Returns:

score – The score calculated using the specified metric.

Return type:

float

set_params(**parameters)

Set the parameters of this estimator.

class deeptab.models.SklearnBaseLSS(*args: Any, **kwargs: Any)[source]
build_model(X, y, val_size=0.2, X_val=None, y_val=None, random_state=101, batch_size=128, shuffle=True, lr=None, lr_patience=None, lr_factor=None, weight_decay=None, train_metrics=None, val_metrics=None, dataloader_kwargs={})[source]

Builds the model using the provided training data.

Parameters:
  • X (DataFrame or array-like, shape (n_samples, n_features)) – The training input samples.

  • y (array-like, shape (n_samples,) or (n_samples, n_targets)) – The target values (real numbers).

  • val_size (float) – The proportion of the dataset to include in the validation split if X_val is None. Ignored if X_val is provided.

  • X_val (DataFrame or array-like, shape (n_samples, n_features), optional) – The validation input samples. If provided, X and y are not split and this data is used for validation.

  • y_val (array-like, shape (n_samples,) or (n_samples, n_targets), optional) – The validation target values. Required if X_val is provided.

  • random_state (int) – Controls the shuffling applied to the data before applying the split.

  • batch_size (int) – Number of samples per gradient update.

  • shuffle (bool) – Whether to shuffle the training data before each epoch.

  • lr (float | None) – Learning rate for the optimizer.

  • lr_patience (int | None) – Number of epochs with no improvement on the validation loss to wait before reducing the learning rate.

  • lr_factor (float | None) – Factor by which the learning rate will be reduced.

  • train_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during training.

  • val_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during validation.

  • weight_decay (float | None) – Weight decay (L2 penalty) coefficient.

  • dataloader_kwargs (dict, default={}) – The kwargs for the pytorch dataloader class.

Returns:

self – The built distributional regressor.

Return type:

object

encode(X, batch_size=64)[source]

Encodes input data using the trained model’s embedding layer.

Parameters:
  • X (array-like or DataFrame) – Input data to be encoded.

  • batch_size (int, optional, default=64) – Batch size for encoding.

Returns:

Encoded representations of the input data.

Return type:

torch.Tensor

Raises:

ValueError – If the model or data module is not fitted.

evaluate(X, y_true, metrics=None, distribution_family=None)[source]

Evaluate the model on the given data using specified metrics.

Parameters:
  • X (array-like or pd.DataFrame of shape (n_samples, n_features)) – The input samples to predict.

  • y_true (array-like of shape (n_samples,)) – The true class labels against which to evaluate the predictions.

  • metrics (dict) – A dictionary where keys are metric names and values are tuples containing the metric function and a boolean indicating whether the metric requires probability scores (True) or class labels (False).

  • distribution_family (str, optional) – Specifies the distribution family the model is predicting for. If None, it will attempt to infer based on the model’s settings.

Returns:

scores – A dictionary with metric names as keys and their corresponding scores as values.

Return type:

dict

Notes

This method uses either the predict or predict_proba method depending on the metric requirements.

fit(X, y, family, val_size=0.2, X_val=None, y_val=None, max_epochs=100, random_state=101, batch_size=128, shuffle=True, patience=15, monitor='val_loss', mode='min', lr=None, lr_patience=None, lr_factor=None, weight_decay=None, checkpoint_path='model_checkpoints', distributional_kwargs=None, train_metrics=None, val_metrics=None, dataloader_kwargs={}, rebuild=True, **trainer_kwargs)[source]

Trains the regression model using the provided training data. Optionally, a separate validation set can be used.

Parameters:
  • X (DataFrame or array-like, shape (n_samples, n_features)) – The training input samples.

  • y (array-like, shape (n_samples,) or (n_samples, n_targets)) – The target values (real numbers).

  • family (str) – The name of the distribution family to use for the loss function. Examples include ‘normal’ for regression tasks.

  • val_size (float) – The proportion of the dataset to include in the validation split if X_val is None. Ignored if X_val is provided.

  • X_val (DataFrame or array-like, shape (n_samples, n_features), optional) – The validation input samples. If provided, X and y are not split and this data is used for validation.

  • y_val (array-like, shape (n_samples,) or (n_samples, n_targets), optional) – The validation target values. Required if X_val is provided.

  • max_epochs (int) – Maximum number of epochs for training.

  • random_state (int) – Controls the shuffling applied to the data before applying the split.

  • batch_size (int) – Number of samples per gradient update.

  • shuffle (bool) – Whether to shuffle the training data before each epoch.

  • patience (int) – Number of epochs with no improvement on the validation loss to wait before early stopping.

  • monitor (str) – The metric to monitor for early stopping.

  • mode (str) – Whether the monitored metric should be minimized (min) or maximized (max).

  • lr (float | None) – Learning rate for the optimizer.

  • lr_patience (int | None) – Number of epochs with no improvement on the validation loss to wait before reducing the learning rate.

  • factor (float, default=0.1) – Factor by which the learning rate will be reduced.

  • weight_decay (float | None) – Weight decay (L2 penalty) coefficient.

  • distributional_kwargs (dict, default=None) – any arguments taht are specific for a certain distribution.

  • train_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during training.

  • val_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during validation.

  • checkpoint_path (str, default="model_checkpoints") – Path where the checkpoints are being saved.

  • dataloader_kwargs (dict, default={}) – The kwargs for the pytorch dataloader class.

  • **trainer_kwargs (Additional keyword arguments for PyTorch Lightning's Trainer class.)

Returns:

self – The fitted regressor.

Return type:

object

get_default_metrics(distribution_family)[source]

Provides default metrics based on the distribution family.

Parameters:

distribution_family (str) – The distribution family for which to provide default metrics.

Returns:

metrics – A dictionary of default metric functions.

Return type:

dict

get_number_of_params(requires_grad=True)[source]

Calculate the number of parameters in the model.

Parameters:

requires_grad (bool, optional) – If True, only count the parameters that require gradients (trainable parameters). If False, count all parameters. Default is True.

Returns:

The total number of parameters in the model.

Return type:

int

Raises:

ValueError – If the model has not been built prior to calling this method.

get_params(deep=True)[source]

Get parameters for this estimator.

Parameters:

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

params – Parameter names mapped to their values.

Return type:

dict

classmethod load(path)[source]

Load and return a fitted model from path.

Parameters:

path (str) – Path to a file previously written by save().

Returns:

A fully reconstructed, ready-to-predict estimator.

Return type:

estimator

optimize_hparams(X, y, X_val=None, y_val=None, time=100, max_epochs=200, prune_by_epoch=True, prune_epoch=5, fixed_params={'cat_encoding': 'int', 'head_layer_size_length': 0, 'head_skip_layer': False, 'head_skip_layers': False, 'pooling_method': 'avg', 'use_cls': False}, custom_search_space=None, **optimize_kwargs)[source]

Optimizes hyperparameters using Bayesian optimization with optional pruning.

Parameters:
  • X (array-like) – Training data.

  • y (array-like) – Training labels.

  • X_val (array-like, optional) – Validation data and labels.

  • y_val (array-like, optional) – Validation data and labels.

  • time (int) – The number of optimization trials to run.

  • max_epochs (int) – Maximum number of epochs for training.

  • prune_by_epoch (bool) – Whether to prune based on a specific epoch (True) or the best validation loss (False).

  • prune_epoch (int) – The specific epoch to prune by when prune_by_epoch is True.

  • **optimize_kwargs (dict) – Additional keyword arguments passed to the fit method.

Returns:

best_hparams – Best hyperparameters found during optimization.

Return type:

list

predict(X, raw=False, device=None)[source]

Predicts target values for the given input samples.

Parameters:

X (DataFrame or array-like, shape (n_samples, n_features)) – The input samples for which to predict target values.

Returns:

predictions – The predicted target values.

Return type:

ndarray, shape (n_samples,) or (n_samples, n_outputs)

save(path)[source]

Save the fitted model to path.

Parameters:

path (str) – Destination file path (e.g. "model.pt").

Raises:

ValueError – If the model has not been fitted yet.

Return type:

None

score(X, y, metric='NLL')[source]

Calculate the score of the model using the specified metric.

Parameters:
  • X (array-like or pd.DataFrame of shape (n_samples, n_features)) – The input samples to predict.

  • y (array-like of shape (n_samples,) or (n_samples, n_outputs)) – The true target values against which to evaluate the predictions.

  • metric (str, default="NLL") – So far, only negative log-likelihood is supported

Returns:

score – The score calculated using the specified metric.

Return type:

float

set_params(**parameters)[source]

Set the parameters of this estimator.

Parameters:

**parameters (dict) – Estimator parameters.

Returns:

self – Estimator instance.

Return type:

object

class deeptab.models.SklearnBaseRegressor(*args: Any, **kwargs: Any)[source]
build_model(X, y, val_size=0.2, X_val=None, y_val=None, embeddings=None, embeddings_val=None, random_state=101, batch_size=128, shuffle=True, lr=None, lr_patience=None, lr_factor=None, weight_decay=None, train_metrics=None, val_metrics=None, dataloader_kwargs={})[source]

Builds the model using the provided training data.

Parameters:
  • X (DataFrame or array-like, shape (n_samples, n_features)) – The training input samples.

  • y (array-like, shape (n_samples,) or (n_samples, n_targets)) – The target values (real numbers).

  • val_size (float) – The proportion of the dataset to include in the validation split if X_val is None. Ignored if X_val is provided.

  • X_val (DataFrame or array-like, shape (n_samples, n_features), optional) – The validation input samples. If provided, X and y are not split and this data is used for validation.

  • y_val (array-like, shape (n_samples,) or (n_samples, n_targets), optional) – The validation target values. Required if X_val is provided.

  • random_state (int) – Controls the shuffling applied to the data before applying the split.

  • batch_size (int) – Number of samples per gradient update.

  • shuffle (bool) – Whether to shuffle the training data before each epoch.

  • lr (float | None) – Learning rate for the optimizer.

  • lr_patience (int | None) – Number of epochs with no improvement on the validation loss to wait before reducing the learning rate.

  • factor (float, default=0.1) – Factor by which the learning rate will be reduced.

  • weight_decay (float | None) – Weight decay (L2 penalty) coefficient.

  • train_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during training.

  • val_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during validation.

  • dataloader_kwargs (dict, default={}) – The kwargs for the pytorch dataloader class.

Returns:

self – The built regressor.

Return type:

object

encode(X, embeddings=None, batch_size=64)

Encodes input data using the trained model’s embedding layer.

Parameters:
  • X (array-like or DataFrame) – Input data to be encoded.

  • batch_size (int, optional, default=64) – Batch size for encoding.

Returns:

Encoded representations of the input data.

Return type:

torch.Tensor

Raises:

ValueError – If the model or data module is not fitted.

evaluate(X, y_true, embeddings=None, metrics=None)[source]

Evaluate the model on the given data using specified metrics.

Parameters:
  • X (array-like or pd.DataFrame of shape (n_samples, n_features)) – The input samples to predict.

  • y_true (array-like of shape (n_samples,) or (n_samples, n_outputs)) – The true target values against which to evaluate the predictions.

  • metrics (dict) – A dictionary where keys are metric names and values are the metric functions.

Notes

This method uses the predict method to generate predictions and computes each metric.

Returns:

scores – A dictionary with metric names as keys and their corresponding scores as values.

Return type:

dict

fit(X, y, val_size=0.2, X_val=None, y_val=None, embeddings=None, embeddings_val=None, max_epochs=100, random_state=101, batch_size=128, shuffle=True, patience=15, monitor='val_loss', mode='min', lr=None, lr_patience=None, lr_factor=None, weight_decay=None, checkpoint_path='model_checkpoints', dataloader_kwargs={}, train_metrics=None, val_metrics=None, rebuild=True, **trainer_kwargs)[source]

Trains the regression model using the provided training data. Optionally, a separate validation set can be used.

Parameters:
  • X (DataFrame or array-like, shape (n_samples, n_features)) – The training input samples.

  • y (array-like, shape (n_samples,) or (n_samples, n_targets)) – The target values (real numbers).

  • val_size (float) – The proportion of the dataset to include in the validation split if X_val is None. Ignored if X_val is provided.

  • X_val (DataFrame or array-like, shape (n_samples, n_features), optional) – The validation input samples. If provided, X and y are not split and this data is used for validation.

  • y_val (array-like, shape (n_samples,) or (n_samples, n_targets), optional) – The validation target values. Required if X_val is provided.

  • max_epochs (int) – Maximum number of epochs for training.

  • random_state (int) – Controls the shuffling applied to the data before applying the split.

  • batch_size (int) – Number of samples per gradient update.

  • shuffle (bool) – Whether to shuffle the training data before each epoch.

  • patience (int) – Number of epochs with no improvement on the validation loss to wait before early stopping.

  • monitor (str) – The metric to monitor for early stopping.

  • mode (str) – Whether the monitored metric should be minimized (min) or maximized (max).

  • lr (float | None) – Learning rate for the optimizer.

  • lr_patience (int | None) – Number of epochs with no improvement on the validation loss to wait before reducing the learning rate.

  • factor (float, default=0.1) – Factor by which the learning rate will be reduced.

  • weight_decay (float | None) – Weight decay (L2 penalty) coefficient.

  • checkpoint_path (str, default="model_checkpoints") – Path where the checkpoints are being saved.

  • dataloader_kwargs (dict, default={}) – The kwargs for the pytorch dataloader class.

  • train_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during training.

  • val_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during validation.

  • rebuild (bool, default=True) – Whether to rebuild the model when it already was built.

  • **trainer_kwargs (Additional keyword arguments for PyTorch Lightning's Trainer class.)

Returns:

self – The fitted regressor.

Return type:

object

get_number_of_params(requires_grad=True)

Calculate the number of parameters in the model.

Parameters:

requires_grad (bool, optional) – If True, only count the parameters that require gradients (trainable parameters). If False, count all parameters. Default is True.

Returns:

The total number of parameters in the model.

Return type:

int

Raises:

ValueError – If the model has not been built prior to calling this method.

get_params(deep=True)

Get parameters for this estimator.

classmethod load(path)

Load and return a fitted model from path.

Parameters:

path (str) – Path to a file previously written by save().

Returns:

A fully reconstructed, ready-to-predict estimator of the same type that was saved.

Return type:

estimator

optimize_hparams(X, y, X_val=None, y_val=None, embeddings=None, embeddings_val=None, time=100, max_epochs=200, prune_by_epoch=True, prune_epoch=5, fixed_params={'cat_encoding': 'int', 'head_layer_size_length': 0, 'head_skip_layer': False, 'head_skip_layers': False, 'pooling_method': 'avg', 'use_cls': False}, custom_search_space=None, **optimize_kwargs)[source]

Optimizes hyperparameters using Bayesian optimization with optional pruning.

Parameters:
  • X (array-like) – Training data.

  • y (array-like) – Training labels.

  • X_val (array-like, optional) – Validation data and labels.

  • y_val (array-like, optional) – Validation data and labels.

  • time (int) – The number of optimization trials to run.

  • max_epochs (int) – Maximum number of epochs for training.

  • prune_by_epoch (bool) – Whether to prune based on a specific epoch (True) or the best validation loss (False).

  • prune_epoch (int) – The specific epoch to prune by when prune_by_epoch is True.

  • **optimize_kwargs (dict) – Additional keyword arguments passed to the fit method.

Returns:

best_hparams – Best hyperparameters found during optimization.

Return type:

list

predict(X, embeddings=None, device=None)[source]

Predicts target values for the given input samples.

Parameters:

X (DataFrame or array-like, shape (n_samples, n_features)) – The input samples for which to predict target values.

Returns:

predictions – The predicted target values.

Return type:

ndarray, shape (n_samples,) or (n_samples, n_outputs)

pretrain(pretrain_epochs=15, k_neighbors=10, temperature=0.1, save_path='pretrained_embeddings.pth', lr=0.001, use_positive=True, use_negative=False, pool_sequence=True)[source]

Pretrains the embedding layer of the model using a contrastive learning approach.

This method performs pretraining by optimizing the embeddings with respect to neighborhood structure in the feature space. The embeddings are saved after training.

Parameters:
  • pretrain_epochs (int, default=15) – Number of epochs to run pretraining.

  • k_neighbors (int, default=10) – Number of neighbors used in the contrastive loss computation.

  • temperature (float, default=0.1) – Temperature parameter for contrastive loss scaling.

  • save_path (str, default="pretrained_embeddings.pth") – Path to save the pretrained embeddings.

  • lr (float, default=1e-3) – Learning rate for the pretraining optimizer.

  • use_positive (bool, default=True) – Whether to include positive pairs in contrastive learning.

  • use_negative (bool, default=False) – Whether to include negative pairs in contrastive learning.

  • pool_sequence (bool, default=True) – Whether to apply sequence pooling before computing contrastive loss.

Raises:
  • ValueError – If the model has not been built before calling this method.

  • ValueError – If the model does not contain an embedding layer.

Notes

  • This function requires that self.build_model() has been called beforehand.

  • The pretraining method uses self.task_model.estimator.embedding_layer.

  • The method invokes super()._pretrain() with regression mode enabled.

save(path)

Save the fitted model to path.

The bundle written by this method can be restored with load(). It contains all state required for inference: the config, the fitted preprocessor, feature metadata, and the neural-network weights.

Parameters:

path (str) – Destination file path (e.g. "model.pt").

Raises:

ValueError – If the model has not been fitted yet.

Return type:

None

score(X, y, embeddings=None, metric=sklearn.metrics.mean_squared_error)[source]

Calculate the score of the model using the specified metric.

Parameters:
  • X (array-like or pd.DataFrame of shape (n_samples, n_features)) – The input samples to predict.

  • y (array-like of shape (n_samples,) or (n_samples, n_outputs)) – The true target values against which to evaluate the predictions.

  • metric (callable, default=mean_squared_error) – The metric function to use for evaluation. Must be a callable with the signature metric(y_true, y_pred).

Returns:

score – The score calculated using the specified metric.

Return type:

float

set_params(**parameters)

Set the parameters of this estimator.