TabulaRNN

Recurrent neural network for tabular data. TabulaRNN treats the feature vector as a sequence of tokens and processes it with a recurrent cell. The cell type is configurable: RNN, LSTM, GRU, mLSTM (matrix LSTM), or sLSTM (scalar LSTM from the xLSTM family). This makes it a flexible sequence model that spans classical to modern recurrent architectures.

When to Use

Best suited for datasets where feature ordering encodes meaningful structure — for example, temporally ordered measurements stored as columns. Also a viable alternative to Transformer-based models when memory efficiency is a priority.

Limitations

Performance is sensitive to feature ordering; shuffling columns can significantly change results.
May underperform Transformer architectures on unordered tabular data where positional bias is irrelevant.
The mLSTM and sLSTM variants are newer and less empirically validated.

API Reference

class deeptab.models.TabulaRNNRegressor(*args: Any, **kwargs: Any)[source]

TabulaRNN regressor. This class extends the SklearnBaseRegressor class and uses the TabulaRNN model with the default TabulaRNN configuration.

Notes

The parameters for this class include the attributes from the config dataclass as well as preprocessing arguments handled by the base class.

Configuration class for the TabulaRNN model with predefined hyperparameters.

Parameters:

model_type (str, default="RNN") – Type of model, one of “RNN”, “LSTM”, “GRU”, “mLSTM”, “sLSTM”.
n_layers (int, default=4) – Number of layers in the RNN.
rnn_dropout (float, default=0.2) – Dropout rate for the RNN layers.
d_model (int, default=128) – Dimensionality of embeddings or model representations.
norm (str, default="RMSNorm") – Normalization method to be used.
activation (callable, default=nn.SELU()) – Activation function for the RNN layers.
residuals (bool, default=False) – Whether to include residual connections in the RNN.
head_layer_sizes (list, default=()) – Sizes of the layers in the head of the model.
head_dropout (float, default=0.5) – Dropout rate for the head layers.
head_skip_layers (bool, default=False) – Whether to skip layers in the head.
head_activation (callable, default=nn.SELU()) – Activation function for the head layers.
head_use_batch_norm (bool, default=False) – Whether to use batch normalization in the head layers.
pooling_method (str, default="avg") – Pooling method to be used (‘avg’, ‘cls’, etc.).
norm_first (bool, default=False) – Whether to apply normalization before other operations in each block.
layer_norm_eps (float, default=1e-05) – Epsilon value for layer normalization.
bias (bool, default=True) – Whether to use bias in the linear layers.
rnn_activation (str, default="relu") – Activation function for the RNN layers.
dim_feedforward (int, default=256) – Size of the feedforward network.
d_conv (int, default=4) – Size of the convolutional layer for embedding features.
dilation (int, default=1) – Dilation factor for the convolution.
conv_bias (bool, default=True) – Whether to use bias in the convolutional layers.
feature_preprocessing (dict, optional) – Dictionary mapping feature names to specific preprocessing methods. Overrides global defaults.
n_bins (int, default=64) – Number of bins used for binning-based preprocessing (e.g., for discretizers or PLE).
numerical_preprocessing (str, default="ple") – Preprocessing method for numerical features (e.g., “standardization”, “minmax”, “ple”, “rbf”, etc.).
categorical_preprocessing (str, default="int") – Preprocessing method for categorical features (e.g., “int”, “ordinal”, “onehot”).
use_decision_tree_bins (bool, default=False) – Whether to use decision tree binning for numerical discretization.
binning_strategy (str, default="uniform") – Strategy for bin placement when not using tree-based methods. Options: “uniform”, “quantile”.
task (str, default="regression") – Problem type used to guide preprocessing (e.g., “regression” or “classification”).
cat_cutoff (float or int, default=0.03) – Threshold to determine whether integer-valued features are treated as categorical.
treat_all_integers_as_numerical (bool, default=False) – If True, treat all integer-typed columns as numerical regardless of cardinality.
degree (int, default=3) – Degree of polynomial or spline basis functions where applicable.
scaling_strategy (str, default="minmax") – Strategy for feature scaling (e.g., “standardization”, “minmax”, etc.).
n_knots (int, default=64) – Number of knots used in spline-based feature expansions.
use_decision_tree_knots (bool, default=True) – Whether to use decision tree-based knot placement for spline transformations.
knots_strategy (str, default="uniform") – Strategy for placing knots for splines (“uniform” or “quantile”).
spline_implementation (str, default="sklearn") – Which spline backend implementation to use (e.g., “sklearn”, “custom”).
min_unique_vals (int, default=5) – Minimum number of unique values required for a feature to be treated as numerical.

Examples

>>> from deeptab.models import TabulaRNNRegressor
>>> model = TabulaRNNRegressor(d_model=64)
>>> model.fit(X_train, y_train)
>>> preds = model.predict(X_test)
>>> model.evaluate(X_test, y_test)

build_model(X, y, val_size=0.2, X_val=None, y_val=None, embeddings=None, embeddings_val=None, random_state=101, batch_size=128, shuffle=True, lr=None, lr_patience=None, lr_factor=None, weight_decay=None, train_metrics=None, val_metrics=None, dataloader_kwargs={})

Builds the model using the provided training data.

Parameters:

X (DataFrame or array-like, shape (n_samples, n_features)) – The training input samples.
y (array-like, shape (n_samples,) or (n_samples, n_targets)) – The target values (real numbers).
val_size (float) – The proportion of the dataset to include in the validation split if X_val is None. Ignored if X_val is provided.
X_val (DataFrame or array-like, shape (n_samples, n_features), optional) – The validation input samples. If provided, X and y are not split and this data is used for validation.
y_val (array-like, shape (n_samples,) or (n_samples, n_targets), optional) – The validation target values. Required if X_val is provided.
random_state (int) – Controls the shuffling applied to the data before applying the split.
batch_size (int) – Number of samples per gradient update.
shuffle (bool) – Whether to shuffle the training data before each epoch.
lr (float | None) – Learning rate for the optimizer.
lr_patience (int | None) – Number of epochs with no improvement on the validation loss to wait before reducing the learning rate.
factor (float, default=0.1) – Factor by which the learning rate will be reduced.
weight_decay (float | None) – Weight decay (L2 penalty) coefficient.
train_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during training.
val_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during validation.
dataloader_kwargs (dict, default={}) – The kwargs for the pytorch dataloader class.

Returns:

self – The built regressor.

Return type:

object

encode(X, embeddings=None, batch_size=64)

Encodes input data using the trained model’s embedding layer.

Parameters:

X (array-like or DataFrame) – Input data to be encoded.
batch_size (int, optional, default=64) – Batch size for encoding.

Returns:

Encoded representations of the input data.

Return type:

torch.Tensor

Raises:

ValueError – If the model or data module is not fitted.

evaluate(X, y_true, embeddings=None, metrics=None)

Evaluate the model on the given data using specified metrics.

Parameters:

X (array-like or pd.DataFrame of shape (n_samples, n_features)) – The input samples to predict.
y_true (array-like of shape (n_samples,) or (n_samples, n_outputs)) – The true target values against which to evaluate the predictions.
metrics (dict) – A dictionary where keys are metric names and values are the metric functions.

Notes

This method uses the predict method to generate predictions and computes each metric.

Returns:: scores – A dictionary with metric names as keys and their corresponding scores as values.
Return type:: dict

fit(X, y, val_size=0.2, X_val=None, y_val=None, embeddings=None, embeddings_val=None, max_epochs=100, random_state=101, batch_size=128, shuffle=True, patience=15, monitor='val_loss', mode='min', lr=None, lr_patience=None, lr_factor=None, weight_decay=None, checkpoint_path='model_checkpoints', dataloader_kwargs={}, train_metrics=None, val_metrics=None, rebuild=True, **trainer_kwargs)

Trains the regression model using the provided training data. Optionally, a separate validation set can be used.

Parameters:

X (DataFrame or array-like, shape (n_samples, n_features)) – The training input samples.
y (array-like, shape (n_samples,) or (n_samples, n_targets)) – The target values (real numbers).
val_size (float) – The proportion of the dataset to include in the validation split if X_val is None. Ignored if X_val is provided.
X_val (DataFrame or array-like, shape (n_samples, n_features), optional) – The validation input samples. If provided, X and y are not split and this data is used for validation.
y_val (array-like, shape (n_samples,) or (n_samples, n_targets), optional) – The validation target values. Required if X_val is provided.
max_epochs (int) – Maximum number of epochs for training.
random_state (int) – Controls the shuffling applied to the data before applying the split.
batch_size (int) – Number of samples per gradient update.
shuffle (bool) – Whether to shuffle the training data before each epoch.
patience (int) – Number of epochs with no improvement on the validation loss to wait before early stopping.
monitor (str) – The metric to monitor for early stopping.
mode (str) – Whether the monitored metric should be minimized (min) or maximized (max).
lr (float | None) – Learning rate for the optimizer.
lr_patience (int | None) – Number of epochs with no improvement on the validation loss to wait before reducing the learning rate.
factor (float, default=0.1) – Factor by which the learning rate will be reduced.
weight_decay (float | None) – Weight decay (L2 penalty) coefficient.
checkpoint_path (str, default="model_checkpoints") – Path where the checkpoints are being saved.
dataloader_kwargs (dict, default={}) – The kwargs for the pytorch dataloader class.
train_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during training.
val_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during validation.
rebuild (bool, default=True) – Whether to rebuild the model when it already was built.
**trainer_kwargs (Additional keyword arguments for PyTorch Lightning's Trainer class.)

Returns:

self – The fitted regressor.

Return type:

object

get_number_of_params(requires_grad=True)

Calculate the number of parameters in the model.

Parameters:: requires_grad (bool, optional) – If True, only count the parameters that require gradients (trainable parameters). If False, count all parameters. Default is True.
Returns:: The total number of parameters in the model.
Return type:: int
Raises:: ValueError – If the model has not been built prior to calling this method.

get_params(deep=True): Get parameters for this estimator.

classmethod load(path)

Load and return a fitted model from path.

Parameters:: path (str) – Path to a file previously written by save().
Returns:: A fully reconstructed, ready-to-predict estimator of the same type that was saved.
Return type:: estimator

optimize_hparams(X, y, X_val=None, y_val=None, embeddings=None, embeddings_val=None, time=100, max_epochs=200, prune_by_epoch=True, prune_epoch=5, fixed_params={'cat_encoding': 'int', 'head_layer_size_length': 0, 'head_skip_layer': False, 'head_skip_layers': False, 'pooling_method': 'avg', 'use_cls': False}, custom_search_space=None, **optimize_kwargs)

Optimizes hyperparameters using Bayesian optimization with optional pruning.

Parameters:

X (array-like) – Training data.
y (array-like) – Training labels.
X_val (array-like, optional) – Validation data and labels.
y_val (array-like, optional) – Validation data and labels.
time (int) – The number of optimization trials to run.
max_epochs (int) – Maximum number of epochs for training.
prune_by_epoch (bool) – Whether to prune based on a specific epoch (True) or the best validation loss (False).
prune_epoch (int) – The specific epoch to prune by when prune_by_epoch is True.
**optimize_kwargs (dict) – Additional keyword arguments passed to the fit method.

Returns:

best_hparams – Best hyperparameters found during optimization.

Return type:

list

predict(X, embeddings=None, device=None)

Predicts target values for the given input samples.

Parameters:: X (DataFrame or array-like, shape (n_samples, n_features)) – The input samples for which to predict target values.
Returns:: predictions – The predicted target values.
Return type:: ndarray, shape (n_samples,) or (n_samples, n_outputs)

pretrain(pretrain_epochs=15, k_neighbors=10, temperature=0.1, save_path='pretrained_embeddings.pth', lr=0.001, use_positive=True, use_negative=False, pool_sequence=True)

Pretrains the embedding layer of the model using a contrastive learning approach.

This method performs pretraining by optimizing the embeddings with respect to neighborhood structure in the feature space. The embeddings are saved after training.

Parameters:

pretrain_epochs (int, default=15) – Number of epochs to run pretraining.
k_neighbors (int, default=10) – Number of neighbors used in the contrastive loss computation.
temperature (float, default=0.1) – Temperature parameter for contrastive loss scaling.
save_path (str, default="pretrained_embeddings.pth") – Path to save the pretrained embeddings.
lr (float, default=1e-3) – Learning rate for the pretraining optimizer.
use_positive (bool, default=True) – Whether to include positive pairs in contrastive learning.
use_negative (bool, default=False) – Whether to include negative pairs in contrastive learning.
pool_sequence (bool, default=True) – Whether to apply sequence pooling before computing contrastive loss.

Raises:

ValueError – If the model has not been built before calling this method.
ValueError – If the model does not contain an embedding layer.

Notes

This function requires that self.build_model() has been called beforehand.
The pretraining method uses self.task_model.estimator.embedding_layer.
The method invokes super()._pretrain() with regression mode enabled.

save(path)

Save the fitted model to path.

The bundle written by this method can be restored with load(). It contains all state required for inference: the config, the fitted preprocessor, feature metadata, and the neural-network weights.

Parameters:: path (str) – Destination file path (e.g. "model.pt").
Raises:: ValueError – If the model has not been fitted yet.
Return type:: None

score(X, y, embeddings=None, metric=sklearn.metrics.mean_squared_error)

Calculate the score of the model using the specified metric.

Parameters:

X (array-like or pd.DataFrame of shape (n_samples, n_features)) – The input samples to predict.
y (array-like of shape (n_samples,) or (n_samples, n_outputs)) – The true target values against which to evaluate the predictions.
metric (callable, default=mean_squared_error) – The metric function to use for evaluation. Must be a callable with the signature metric(y_true, y_pred).

Returns:

score – The score calculated using the specified metric.

Return type:

float

set_params(**parameters): Set the parameters of this estimator.

class deeptab.models.TabulaRNNClassifier(*args: Any, **kwargs: Any)[source]

TabulaRNN classifier. This class extends the SklearnBaseClassifier class and uses the TabulaRNN model with the default TabulaRNN configuration.

Notes

The parameters for this class include the attributes from the config dataclass as well as preprocessing arguments handled by the base class.

Configuration class for the TabulaRNN model with predefined hyperparameters.

Parameters:

model_type (str, default="RNN") – Type of model, one of “RNN”, “LSTM”, “GRU”, “mLSTM”, “sLSTM”.
n_layers (int, default=4) – Number of layers in the RNN.
rnn_dropout (float, default=0.2) – Dropout rate for the RNN layers.
d_model (int, default=128) – Dimensionality of embeddings or model representations.
norm (str, default="RMSNorm") – Normalization method to be used.
activation (callable, default=nn.SELU()) – Activation function for the RNN layers.
residuals (bool, default=False) – Whether to include residual connections in the RNN.
head_layer_sizes (list, default=()) – Sizes of the layers in the head of the model.
head_dropout (float, default=0.5) – Dropout rate for the head layers.
head_skip_layers (bool, default=False) – Whether to skip layers in the head.
head_activation (callable, default=nn.SELU()) – Activation function for the head layers.
head_use_batch_norm (bool, default=False) – Whether to use batch normalization in the head layers.
pooling_method (str, default="avg") – Pooling method to be used (‘avg’, ‘cls’, etc.).
norm_first (bool, default=False) – Whether to apply normalization before other operations in each block.
layer_norm_eps (float, default=1e-05) – Epsilon value for layer normalization.
bias (bool, default=True) – Whether to use bias in the linear layers.
rnn_activation (str, default="relu") – Activation function for the RNN layers.
dim_feedforward (int, default=256) – Size of the feedforward network.
d_conv (int, default=4) – Size of the convolutional layer for embedding features.
dilation (int, default=1) – Dilation factor for the convolution.
conv_bias (bool, default=True) – Whether to use bias in the convolutional layers.
feature_preprocessing (dict, optional) – Dictionary mapping feature names to specific preprocessing methods. Overrides global defaults.
n_bins (int, default=64) – Number of bins used for binning-based preprocessing (e.g., for discretizers or PLE).
numerical_preprocessing (str, default="ple") – Preprocessing method for numerical features (e.g., “standardization”, “minmax”, “ple”, “rbf”, etc.).
categorical_preprocessing (str, default="int") – Preprocessing method for categorical features (e.g., “int”, “ordinal”, “onehot”).
use_decision_tree_bins (bool, default=False) – Whether to use decision tree binning for numerical discretization.
binning_strategy (str, default="uniform") – Strategy for bin placement when not using tree-based methods. Options: “uniform”, “quantile”.
task (str, default="regression") – Problem type used to guide preprocessing (e.g., “regression” or “classification”).
cat_cutoff (float or int, default=0.03) – Threshold to determine whether integer-valued features are treated as categorical.
treat_all_integers_as_numerical (bool, default=False) – If True, treat all integer-typed columns as numerical regardless of cardinality.
degree (int, default=3) – Degree of polynomial or spline basis functions where applicable.
scaling_strategy (str, default="minmax") – Strategy for feature scaling (e.g., “standardization”, “minmax”, etc.).
n_knots (int, default=64) – Number of knots used in spline-based feature expansions.
use_decision_tree_knots (bool, default=True) – Whether to use decision tree-based knot placement for spline transformations.
knots_strategy (str, default="uniform") – Strategy for placing knots for splines (“uniform” or “quantile”).
spline_implementation (str, default="sklearn") – Which spline backend implementation to use (e.g., “sklearn”, “custom”).
min_unique_vals (int, default=5) – Minimum number of unique values required for a feature to be treated as numerical.

Examples

>>> from deeptab.models import TabulaRNNClassifier
>>> model = TabulaRNNClassifier(d_model=64)
>>> model.fit(X_train, y_train)
>>> preds = model.predict(X_test)
>>> model.evaluate(X_test, y_test)

build_model(X, y, val_size=0.2, X_val=None, y_val=None, embeddings=None, embeddings_val=None, random_state=101, batch_size=128, shuffle=True, lr=None, lr_patience=None, lr_factor=None, weight_decay=None, train_metrics=None, val_metrics=None, dataloader_kwargs={})

Builds the model using the provided training data.

Parameters:

X (DataFrame or array-like, shape (n_samples, n_features)) – The training input samples.
y (array-like, shape (n_samples,) or (n_samples, n_targets)) – The target values (real numbers).
val_size (float) – The proportion of the dataset to include in the validation split if X_val is None. Ignored if X_val is provided.
X_val (DataFrame or array-like, shape (n_samples, n_features), optional) – The validation input samples. If provided, X and y are not split and this data is used for validation.
y_val (array-like, shape (n_samples,) or (n_samples, n_targets), optional) – The validation target values. Required if X_val is provided.
random_state (int) – Controls the shuffling applied to the data before applying the split.
batch_size (int) – Number of samples per gradient update.
shuffle (bool) – Whether to shuffle the training data before each epoch.
lr (float | None) – Learning rate for the optimizer.
lr_patience (int | None) – Number of epochs with no improvement on the validation loss to wait before reducing the learning rate.
lr_factor (float | None) – Factor by which the learning rate will be reduced.
train_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during training.
val_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during validation.
weight_decay (float | None) – Weight decay (L2 penalty) coefficient.
dataloader_kwargs (dict, default={}) – The kwargs for the pytorch dataloader class.

Returns:

self – The built classifier.

Return type:

object

encode(X, embeddings=None, batch_size=64)

Encodes input data using the trained model’s embedding layer.

Parameters:

X (array-like or DataFrame) – Input data to be encoded.
batch_size (int, optional, default=64) – Batch size for encoding.

Returns:

Encoded representations of the input data.

Return type:

torch.Tensor

Raises:

ValueError – If the model or data module is not fitted.

evaluate(X, y_true, embeddings=None, metrics=None)

Evaluate the model on the given data using specified metrics.

Parameters:

X (array-like or pd.DataFrame of shape (n_samples, n_features)) – The input samples to predict.
y_true (array-like of shape (n_samples,)) – The true class labels against which to evaluate the predictions.
embneddings (array-like or list of shape(n_samples, dimension)) – List or array with embeddings for unstructured data inputs
metrics (dict) – A dictionary where keys are metric names and values are tuples containing the metric function and a boolean indicating whether the metric requires probability scores (True) or class labels (False).

Returns:

scores – A dictionary with metric names as keys and their corresponding scores as values.

Return type:

dict

Notes

This method uses either the predict or predict_proba method depending on the metric requirements.

fit(X, y, val_size=0.2, X_val=None, y_val=None, embeddings=None, embeddings_val=None, max_epochs=100, random_state=101, batch_size=128, shuffle=True, patience=15, monitor='val_loss', mode='min', lr=None, lr_patience=None, lr_factor=None, weight_decay=None, checkpoint_path='model_checkpoints', train_metrics=None, val_metrics=None, dataloader_kwargs={}, rebuild=True, **trainer_kwargs)

Trains the classification model using the provided training data. Optionally, a separate validation set can be used.

Parameters:

X (DataFrame or array-like, shape (n_samples, n_features)) – The training input samples.
y (array-like, shape (n_samples,) or (n_samples, n_targets)) – The target values (real numbers).
val_size (float) – The proportion of the dataset to include in the validation split if X_val is None. Ignored if X_val is provided.
X_val (DataFrame or array-like, shape (n_samples, n_features), optional) – The validation input samples. If provided, X and y are not split and this data is used for validation.
y_val (array-like, shape (n_samples,) or (n_samples, n_targets), optional) – The validation target values. Required if X_val is provided.
max_epochs (int) – Maximum number of epochs for training.
random_state (int) – Controls the shuffling applied to the data before applying the split.
batch_size (int) – Number of samples per gradient update.
shuffle (bool) – Whether to shuffle the training data before each epoch.
patience (int) – Number of epochs with no improvement on the validation loss to wait before early stopping.
monitor (str) – The metric to monitor for early stopping.
mode (str) – Whether the monitored metric should be minimized (min) or maximized (max).
lr (float | None) – Learning rate for the optimizer.
lr_patience (int | None) – Number of epochs with no improvement on the validation loss to wait before reducing the learning rate.
factor (float, default=0.1) – Factor by which the learning rate will be reduced.
weight_decay (float | None) – Weight decay (L2 penalty) coefficient.
checkpoint_path (str, default="model_checkpoints") – Path where the checkpoints are being saved.
train_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during training.
val_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during validation.
dataloader_kwargs (dict, default={}) – The kwargs for the pytorch dataloader class.
rebuild (bool, default=True) – Whether to rebuild the model when it already was built.
**trainer_kwargs (Additional keyword arguments for PyTorch Lightning's Trainer class.)

Returns:

self – The fitted classifier.

Return type:

object

get_number_of_params(requires_grad=True)

Calculate the number of parameters in the model.

Parameters:: requires_grad (bool, optional) – If True, only count the parameters that require gradients (trainable parameters). If False, count all parameters. Default is True.
Returns:: The total number of parameters in the model.
Return type:: int
Raises:: ValueError – If the model has not been built prior to calling this method.

get_params(deep=True): Get parameters for this estimator.

classmethod load(path)

Load and return a fitted model from path.

Parameters:: path (str) – Path to a file previously written by save().
Returns:: A fully reconstructed, ready-to-predict estimator of the same type that was saved.
Return type:: estimator

optimize_hparams(X, y, X_val=None, y_val=None, embeddings=None, embeddings_val=None, time=100, max_epochs=200, prune_by_epoch=True, prune_epoch=5, fixed_params={'cat_encoding': 'int', 'head_layer_size_length': 0, 'head_skip_layer': False, 'head_skip_layers': False, 'pooling_method': 'avg', 'use_cls': False}, custom_search_space=None, **optimize_kwargs)

Optimizes hyperparameters using Bayesian optimization with optional pruning.

Parameters:

X (array-like) – Training data.
y (array-like) – Training labels.
X_val (array-like, optional) – Validation data and labels.
y_val (array-like, optional) – Validation data and labels.
time (int) – The number of optimization trials to run.
max_epochs (int) – Maximum number of epochs for training.
prune_by_epoch (bool) – Whether to prune based on a specific epoch (True) or the best validation loss (False).
prune_epoch (int) – The specific epoch to prune by when prune_by_epoch is True.
**optimize_kwargs (dict) – Additional keyword arguments passed to the fit method.

Returns:

best_hparams – Best hyperparameters found during optimization.

Return type:

list

predict(X, embeddings=None, device=None)

Predicts target labels for the given input samples.

Parameters:: X (DataFrame or array-like, shape (n_samples, n_features)) – The input samples for which to predict target values.
Returns:: predictions – The predicted class labels.
Return type:: ndarray, shape (n_samples,)

predict_proba(X, embeddings=None, device=None)

Predicts class probabilities for the given input samples.

Parameters:: X (DataFrame or array-like, shape (n_samples, n_features)) – The input samples for which to predict class probabilities.
Returns:: probabilities – The predicted class probabilities.
Return type:: ndarray, shape (n_samples, n_classes)

pretrain(pretrain_epochs=15, k_neighbors=10, temperature=0.1, save_path='pretrained_embeddings.pth', lr=0.001, use_positive=True, use_negative=False, pool_sequence=True)

Pretrains the embedding layer of the model using a contrastive learning approach.

This method performs pretraining by optimizing the embeddings with respect to neighborhood structure in the feature space. The embeddings are saved after training.

Parameters:

pretrain_epochs (int, default=15) – Number of epochs to run pretraining.
k_neighbors (int, default=10) – Number of neighbors used in the contrastive loss computation.
temperature (float, default=0.1) – Temperature parameter for contrastive loss scaling.
save_path (str, default="pretrained_embeddings.pth") – Path to save the pretrained embeddings.
lr (float, default=1e-3) – Learning rate for the pretraining optimizer.
use_positive (bool, default=True) – Whether to include positive pairs in contrastive learning.
use_negative (bool, default=False) – Whether to include negative pairs in contrastive learning.
pool_sequence (bool, default=True) – Whether to apply sequence pooling before computing contrastive loss.

Raises:

ValueError – If the model has not been built before calling this method.
ValueError – If the model does not contain an embedding layer.

Notes

This function requires that self.build_model() has been called beforehand.
The pretraining method uses self.task_model.estimator.embedding_layer.
The method invokes super()._pretrain() with regression mode enabled.

save(path)

Save the fitted model to path.

The bundle written by this method can be restored with load(). It contains all state required for inference: the config, the fitted preprocessor, feature metadata, and the neural-network weights.

Parameters:: path (str) – Destination file path (e.g. "model.pt").
Raises:: ValueError – If the model has not been fitted yet.
Return type:: None

score(X, y, embeddings=None, metric=(sklearn.metrics.log_loss, True))

Calculate the score of the model using the specified metric.

Parameters:

X (array-like or pd.DataFrame of shape (n_samples, n_features)) – The input samples to predict.
y (array-like of shape (n_samples,)) – The true class labels against which to evaluate the predictions.
metric (tuple, default=(log_loss, True)) – A tuple containing the metric function and a boolean indicating whether the metric requires probability scores (True) or class labels (False).

Returns:

score – The score calculated using the specified metric.

Return type:

float

set_params(**parameters): Set the parameters of this estimator.

class deeptab.models.TabulaRNNLSS(*args: Any, **kwargs: Any)[source]

TabulaRNN for distributional regression. This class extends the SklearnBaseLSS class and uses the TabulaRNN model with the default TabulaRNN configuration. Supports RNN, LSTM, GRU, mLSTM, and sLSTM architectures.

Notes

The parameters for this class include the attributes from the config dataclass as well as preprocessing arguments handled by the base class.

Configuration class for the TabulaRNN model with predefined hyperparameters.

Parameters:

model_type (str, default="RNN") – Type of model, one of “RNN”, “LSTM”, “GRU”, “mLSTM”, “sLSTM”.
n_layers (int, default=4) – Number of layers in the RNN.
rnn_dropout (float, default=0.2) – Dropout rate for the RNN layers.
d_model (int, default=128) – Dimensionality of embeddings or model representations.
norm (str, default="RMSNorm") – Normalization method to be used.
activation (callable, default=nn.SELU()) – Activation function for the RNN layers.
residuals (bool, default=False) – Whether to include residual connections in the RNN.
head_layer_sizes (list, default=()) – Sizes of the layers in the head of the model.
head_dropout (float, default=0.5) – Dropout rate for the head layers.
head_skip_layers (bool, default=False) – Whether to skip layers in the head.
head_activation (callable, default=nn.SELU()) – Activation function for the head layers.
head_use_batch_norm (bool, default=False) – Whether to use batch normalization in the head layers.
pooling_method (str, default="avg") – Pooling method to be used (‘avg’, ‘cls’, etc.).
norm_first (bool, default=False) – Whether to apply normalization before other operations in each block.
layer_norm_eps (float, default=1e-05) – Epsilon value for layer normalization.
bias (bool, default=True) – Whether to use bias in the linear layers.
rnn_activation (str, default="relu") – Activation function for the RNN layers.
dim_feedforward (int, default=256) – Size of the feedforward network.
d_conv (int, default=4) – Size of the convolutional layer for embedding features.
dilation (int, default=1) – Dilation factor for the convolution.
conv_bias (bool, default=True) – Whether to use bias in the convolutional layers.
feature_preprocessing (dict, optional) – Dictionary mapping feature names to specific preprocessing methods. Overrides global defaults.
n_bins (int, default=64) – Number of bins used for binning-based preprocessing (e.g., for discretizers or PLE).
numerical_preprocessing (str, default="ple") – Preprocessing method for numerical features (e.g., “standardization”, “minmax”, “ple”, “rbf”, etc.).
categorical_preprocessing (str, default="int") – Preprocessing method for categorical features (e.g., “int”, “ordinal”, “onehot”).
use_decision_tree_bins (bool, default=False) – Whether to use decision tree binning for numerical discretization.
binning_strategy (str, default="uniform") – Strategy for bin placement when not using tree-based methods. Options: “uniform”, “quantile”.
task (str, default="regression") – Problem type used to guide preprocessing (e.g., “regression” or “classification”).
cat_cutoff (float or int, default=0.03) – Threshold to determine whether integer-valued features are treated as categorical.
treat_all_integers_as_numerical (bool, default=False) – If True, treat all integer-typed columns as numerical regardless of cardinality.
degree (int, default=3) – Degree of polynomial or spline basis functions where applicable.
scaling_strategy (str, default="minmax") – Strategy for feature scaling (e.g., “standardization”, “minmax”, etc.).
n_knots (int, default=64) – Number of knots used in spline-based feature expansions.
use_decision_tree_knots (bool, default=True) – Whether to use decision tree-based knot placement for spline transformations.
knots_strategy (str, default="uniform") – Strategy for placing knots for splines (“uniform” or “quantile”).
spline_implementation (str, default="sklearn") – Which spline backend implementation to use (e.g., “sklearn”, “custom”).
min_unique_vals (int, default=5) – Minimum number of unique values required for a feature to be treated as numerical.

Examples

>>> from deeptab.models import TabulaRNNLSS
>>> model = TabulaRNNLSS(model_type='LSTM', d_model=128, n_layers=4)
>>> model.fit(X_train, y_train, family='normal')
>>> preds = model.predict(X_test)
>>> model.evaluate(X_test, y_test)

build_model(X, y, val_size=0.2, X_val=None, y_val=None, random_state=101, batch_size=128, shuffle=True, lr=None, lr_patience=None, lr_factor=None, weight_decay=None, train_metrics=None, val_metrics=None, dataloader_kwargs={})

Builds the model using the provided training data.

Parameters:

X (DataFrame or array-like, shape (n_samples, n_features)) – The training input samples.
y (array-like, shape (n_samples,) or (n_samples, n_targets)) – The target values (real numbers).
val_size (float) – The proportion of the dataset to include in the validation split if X_val is None. Ignored if X_val is provided.
X_val (DataFrame or array-like, shape (n_samples, n_features), optional) – The validation input samples. If provided, X and y are not split and this data is used for validation.
y_val (array-like, shape (n_samples,) or (n_samples, n_targets), optional) – The validation target values. Required if X_val is provided.
random_state (int) – Controls the shuffling applied to the data before applying the split.
batch_size (int) – Number of samples per gradient update.
shuffle (bool) – Whether to shuffle the training data before each epoch.
lr (float | None) – Learning rate for the optimizer.
lr_patience (int | None) – Number of epochs with no improvement on the validation loss to wait before reducing the learning rate.
lr_factor (float | None) – Factor by which the learning rate will be reduced.
train_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during training.
val_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during validation.
weight_decay (float | None) – Weight decay (L2 penalty) coefficient.
dataloader_kwargs (dict, default={}) – The kwargs for the pytorch dataloader class.

Returns:

self – The built distributional regressor.

Return type:

object

encode(X, batch_size=64)

Encodes input data using the trained model’s embedding layer.

Parameters:

X (array-like or DataFrame) – Input data to be encoded.
batch_size (int, optional, default=64) – Batch size for encoding.

Returns:

Encoded representations of the input data.

Return type:

torch.Tensor

Raises:

ValueError – If the model or data module is not fitted.

evaluate(X, y_true, metrics=None, distribution_family=None)

Evaluate the model on the given data using specified metrics.

Parameters:

X (array-like or pd.DataFrame of shape (n_samples, n_features)) – The input samples to predict.
y_true (array-like of shape (n_samples,)) – The true class labels against which to evaluate the predictions.
metrics (dict) – A dictionary where keys are metric names and values are tuples containing the metric function and a boolean indicating whether the metric requires probability scores (True) or class labels (False).
distribution_family (str, optional) – Specifies the distribution family the model is predicting for. If None, it will attempt to infer based on the model’s settings.

Returns:

scores – A dictionary with metric names as keys and their corresponding scores as values.

Return type:

dict

Notes

This method uses either the predict or predict_proba method depending on the metric requirements.

fit(X, y, family, val_size=0.2, X_val=None, y_val=None, max_epochs=100, random_state=101, batch_size=128, shuffle=True, patience=15, monitor='val_loss', mode='min', lr=None, lr_patience=None, lr_factor=None, weight_decay=None, checkpoint_path='model_checkpoints', distributional_kwargs=None, train_metrics=None, val_metrics=None, dataloader_kwargs={}, rebuild=True, **trainer_kwargs)

Trains the regression model using the provided training data. Optionally, a separate validation set can be used.

Parameters:

X (DataFrame or array-like, shape (n_samples, n_features)) – The training input samples.
y (array-like, shape (n_samples,) or (n_samples, n_targets)) – The target values (real numbers).
family (str) – The name of the distribution family to use for the loss function. Examples include ‘normal’ for regression tasks.
val_size (float) – The proportion of the dataset to include in the validation split if X_val is None. Ignored if X_val is provided.
X_val (DataFrame or array-like, shape (n_samples, n_features), optional) – The validation input samples. If provided, X and y are not split and this data is used for validation.
y_val (array-like, shape (n_samples,) or (n_samples, n_targets), optional) – The validation target values. Required if X_val is provided.
max_epochs (int) – Maximum number of epochs for training.
random_state (int) – Controls the shuffling applied to the data before applying the split.
batch_size (int) – Number of samples per gradient update.
shuffle (bool) – Whether to shuffle the training data before each epoch.
patience (int) – Number of epochs with no improvement on the validation loss to wait before early stopping.
monitor (str) – The metric to monitor for early stopping.
mode (str) – Whether the monitored metric should be minimized (min) or maximized (max).
lr (float | None) – Learning rate for the optimizer.
lr_patience (int | None) – Number of epochs with no improvement on the validation loss to wait before reducing the learning rate.
factor (float, default=0.1) – Factor by which the learning rate will be reduced.
weight_decay (float | None) – Weight decay (L2 penalty) coefficient.
distributional_kwargs (dict, default=None) – any arguments taht are specific for a certain distribution.
train_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during training.
val_metrics (dict[str, Callable] | None) – torch.metrics dict to be logged during validation.
checkpoint_path (str, default="model_checkpoints") – Path where the checkpoints are being saved.
dataloader_kwargs (dict, default={}) – The kwargs for the pytorch dataloader class.
**trainer_kwargs (Additional keyword arguments for PyTorch Lightning's Trainer class.)

Returns:

self – The fitted regressor.

Return type:

object

get_default_metrics(distribution_family)

Provides default metrics based on the distribution family.

Parameters:: distribution_family (str) – The distribution family for which to provide default metrics.
Returns:: metrics – A dictionary of default metric functions.
Return type:: dict

get_number_of_params(requires_grad=True)

Calculate the number of parameters in the model.

Parameters:: requires_grad (bool, optional) – If True, only count the parameters that require gradients (trainable parameters). If False, count all parameters. Default is True.
Returns:: The total number of parameters in the model.
Return type:: int
Raises:: ValueError – If the model has not been built prior to calling this method.

get_params(deep=True)

Get parameters for this estimator.

Parameters:: deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns:: params – Parameter names mapped to their values.
Return type:: dict

classmethod load(path)

Load and return a fitted model from path.

Parameters:: path (str) – Path to a file previously written by save().
Returns:: A fully reconstructed, ready-to-predict estimator.
Return type:: estimator

optimize_hparams(X, y, X_val=None, y_val=None, time=100, max_epochs=200, prune_by_epoch=True, prune_epoch=5, fixed_params={'cat_encoding': 'int', 'head_layer_size_length': 0, 'head_skip_layer': False, 'head_skip_layers': False, 'pooling_method': 'avg', 'use_cls': False}, custom_search_space=None, **optimize_kwargs)

Optimizes hyperparameters using Bayesian optimization with optional pruning.

Parameters:

X (array-like) – Training data.
y (array-like) – Training labels.
X_val (array-like, optional) – Validation data and labels.
y_val (array-like, optional) – Validation data and labels.
time (int) – The number of optimization trials to run.
max_epochs (int) – Maximum number of epochs for training.
prune_by_epoch (bool) – Whether to prune based on a specific epoch (True) or the best validation loss (False).
prune_epoch (int) – The specific epoch to prune by when prune_by_epoch is True.
**optimize_kwargs (dict) – Additional keyword arguments passed to the fit method.

Returns:

best_hparams – Best hyperparameters found during optimization.

Return type:

list

predict(X, raw=False, device=None)

Predicts target values for the given input samples.

Parameters:: X (DataFrame or array-like, shape (n_samples, n_features)) – The input samples for which to predict target values.
Returns:: predictions – The predicted target values.
Return type:: ndarray, shape (n_samples,) or (n_samples, n_outputs)

save(path)

Save the fitted model to path.

Parameters:: path (str) – Destination file path (e.g. "model.pt").
Raises:: ValueError – If the model has not been fitted yet.
Return type:: None

score(X, y, metric='NLL')

Calculate the score of the model using the specified metric.

Parameters:

X (array-like or pd.DataFrame of shape (n_samples, n_features)) – The input samples to predict.
y (array-like of shape (n_samples,) or (n_samples, n_outputs)) – The true target values against which to evaluate the predictions.
metric (str, default="NLL") – So far, only negative log-likelihood is supported

Returns:

score – The score calculated using the specified metric.

Return type:

float

set_params(**parameters)

Set the parameters of this estimator.

Parameters:: **parameters (dict) – Estimator parameters.
Returns:: self – Estimator instance.
Return type:: object