TabTransformer
Transformer specialized for categorical features with contextual embeddings.
For detailed usage, configuration examples, and performance notes, see TabTransformer.
API Reference
- class deeptab.models.TabTransformerClassifier(model_config=None, preprocessing_config=None, trainer_config=None, observability_config=None, random_state=None)[source]
TabTransformer classifier. This class extends the SklearnBaseClassifier class and uses the TabTransformer model with the default TabTransformer configuration.
- Parameters:
model_config (TabTransformerConfig, optional) – Architecture hyperparameters for the model. If
None, a defaultTabTransformerConfigis used. See that class for the full list of available fields.preprocessing_config (PreprocessingConfig, optional) – Feature preprocessing settings such as scaling, encoding, and numerical embeddings. If
None, defaults fromPreprocessingConfigare used.trainer_config (TrainerConfig, optional) – Training-loop settings such as epochs, batch size, learning rate, and early stopping. If
None, defaults fromTrainerConfigare used.observability_config (ObservabilityConfig, optional) – Optional logging, experiment tracking, and run-directory settings (
deeptab.core.observability.ObservabilityConfig). IfNone, observability is disabled and the estimator emits nothing.random_state (int, optional) – Seed for reproducible weight initialisation and data shuffling.
Examples
>>> from deeptab.models import TabTransformerClassifier >>> model = TabTransformerClassifier() >>> model.fit(X_train, y_train) >>> preds = model.predict(X_test) >>> model.evaluate(X_test, y_test)- build_model(X, y, val_size=0.2, X_val=None, y_val=None, embeddings=None, embeddings_val=None, random_state=101, batch_size=128, shuffle=True, stratify=True, lr=None, lr_patience=None, lr_factor=None, weight_decay=None, train_metrics=None, val_metrics=None, dataloader_kwargs={}, class_weight=None, loss_fct=None, balanced_sampler=False, sample_weight=None)
Builds the model using the provided training data.
- Parameters:
X (DataFrame or array-like, shape (n_samples, n_features)) – The training input samples.
y (array-like, shape (n_samples,) or (n_samples, n_targets)) – The target values (real numbers).
val_size (
float) – The proportion of the dataset to include in the validation split ifX_valis None. Ignored ifX_valis provided.X_val (DataFrame or array-like, shape (n_samples, n_features), optional) – The validation input samples. If provided,
Xandyare not split and this data is used for validation.y_val (array-like, shape (n_samples,) or (n_samples, n_targets), optional) – The validation target values. Required if
X_valis provided.random_state (
int) – Controls the shuffling applied to the data before applying the split.batch_size (
int) – Number of samples per gradient update.shuffle (
bool) – Whether to shuffle the training data before each epoch.stratify (
bool) – Whether to stratify the validation split onyso the split keeps the same class proportions. Set to False for a purely random split.lr (
float|None) – Learning rate for the optimizer.lr_patience (
int|None) – Number of epochs with no improvement on the validation loss to wait before reducing the learning rate.lr_factor (
float|None) – Factor by which the learning rate will be reduced.train_metrics (
dict[str,Callable] |None) – torch.metrics dict to be logged during training.val_metrics (
dict[str,Callable] |None) – torch.metrics dict to be logged during validation.weight_decay (
float|None) – Weight decay (L2 penalty) coefficient.dataloader_kwargs (dict, default={}) – The kwargs for the pytorch dataloader class.
class_weight (
str|dict|list|ndarray|None) – Weights associated with classes for imbalanced data."balanced"mirrors scikit-learn and usesn_samples / (n_classes * bincount(y)). A mapping{class_label: weight}or an array (ordered likenp.unique(y)) sets weights explicitly. Ignored whenloss_fctis annn.Module.loss_fct (nn.Module, str, or None, default=None) – Custom loss. An
nn.Moduleis used as-is; a registered loss name (e.g."focal","bce","cross_entropy") is built and combined withclass_weight.Nonefalls back to the default (weighted) task loss.balanced_sampler (
bool) – IfTrue, draw class-balanced mini-batches with aWeightedRandomSampler(oversamples minority classes).sample_weight (array-like, optional) – Explicit per-row sampling weights (length matches
X). Takes precedence overbalanced_samplerand drives theWeightedRandomSampler.
- Returns:
self – The built classifier.
- Return type:
object
- property config
The instantiated model config object backing this estimator.
Stored on the private
_configattribute so it stays out of sklearn’sget_params/__init__introspection (it is derived frommodel_config/_model_clsrather than a constructor parameter), while remaining readable and settable asestimator.config.
- configure_observability(config)
Wire up logging backends described by config.
Can be called at any point — before or after
fit(). Changes take effect on the next lifecycle event emitted (i.e. the nextfit()orpredict()call).- Parameters:
config (
ObservabilityConfig) – Observability settings. Imports optional dependencies lazily; raisesImportErrorwith install hints if they are absent.- Return type:
None
- describe()
Return a structured description of the estimator and fitted model.
The method is safe to call before fitting. Parameter counts and feature metadata are included only after the model has been built.
- Return type:
dict[str,Any]
- encode(X, embeddings=None, batch_size=64)
Return dense embedding vectors from the model backbone.
Runs the fitted model’s
encodemethod on batches of X and concatenates the results into a single tensor.- Parameters:
X (array-like or DataFrame of shape (n_samples, n_features)) – Input features to encode.
embeddings (array-like or None, optional) – Pre-computed external embeddings aligned with the rows of X.
batch_size (int, default=64) – Number of samples processed in each forward pass.
- Returns:
Encoded representations of the input data.
- Return type:
torch.Tensor of shape (n_samples, embedding_dim)
- Raises:
ValueError – If the model has not been fitted yet.
Examples
>>> clf = MLPClassifier() >>> clf.fit(X_train, y_train) >>> embeddings = clf.encode(X_test) # (n_samples, embedding_dim) >>> embeddings.shape torch.Size([100, 64])
- evaluate(X, y_true, embeddings=None, metrics=None)
Evaluate the model on the given data using specified metrics.
- Parameters:
X (array-like or pd.DataFrame of shape (n_samples, n_features)) – The input samples to predict.
y_true (array-like of shape (n_samples,)) – The true class labels.
embeddings (array-like or list, optional) – Embeddings for unstructured data inputs.
metrics (dict, optional) –
A
{name: callable}dictionary where each callable has the signaturemetric(y_true, y_pred) -> float. Each callable may be aDeepTabMetricinstance or any plain callable. Metrics that need probability scores (e.g. AUROC, LogLoss) should accept the 2-Dpredict_probaoutput asy_pred; metrics that need class labels (e.g. Accuracy, F1) should accept the 1-Dpredictoutput.For
DeepTabMetricinstances, the method inspects thenameattribute to decide which prediction format to supply: probability-based metrics (auroc,auprc,log_loss,brier,ece) receivepredict_probaoutput; all others receivepredictoutput.If
None, defaults to the registry defaults for"classification"(Accuracy, AUROC, LogLoss).
- Returns:
scores –
{metric_name: score}dictionary.- Return type:
dict
- fit(X, y, val_size=0.2, X_val=None, y_val=None, embeddings=None, embeddings_val=None, max_epochs=100, random_state=101, batch_size=128, shuffle=True, stratify=True, patience=15, monitor='val_loss', mode='min', lr=None, lr_patience=None, lr_factor=None, weight_decay=None, checkpoint_path='model_checkpoints', train_metrics=None, val_metrics=None, dataloader_kwargs={}, rebuild=True, class_weight=None, loss_fct=None, balanced_sampler=False, sample_weight=None, **trainer_kwargs)
Trains the classification model using the provided training data. Optionally, a separate validation set can be used.
- Parameters:
X (DataFrame or array-like, shape (n_samples, n_features)) – The training input samples.
y (array-like, shape (n_samples,) or (n_samples, n_targets)) – The target values (real numbers).
val_size (
float) – The proportion of the dataset to include in the validation split ifX_valis None. Ignored ifX_valis provided.X_val (DataFrame or array-like, shape (n_samples, n_features), optional) – The validation input samples. If provided,
Xandyare not split and this data is used for validation.y_val (array-like, shape (n_samples,) or (n_samples, n_targets), optional) – The validation target values. Required if
X_valis provided.max_epochs (
int) – Maximum number of epochs for training.random_state (
int) – Controls the shuffling applied to the data before applying the split.batch_size (
int) – Number of samples per gradient update.shuffle (
bool) – Whether to shuffle the training data before each epoch.stratify (
bool) – Whether to stratify the validation split onyso the split keeps the same class proportions. Set to False for a purely random split. When aTrainerConfigis set, itsstratifyvalue takes precedence.patience (
int) – Number of epochs with no improvement on the validation loss to wait before early stopping.monitor (
str) – The metric to monitor for early stopping.mode (
str) – Whether the monitored metric should be minimized (min) or maximized (max).lr (
float|None) – Learning rate for the optimizer.lr_patience (
int|None) – Number of epochs with no improvement on the validation loss to wait before reducing the learning rate.factor (float, default=0.1) – Factor by which the learning rate will be reduced.
weight_decay (
float|None) – Weight decay (L2 penalty) coefficient.checkpoint_path (str, default="model_checkpoints") – Path where the checkpoints are being saved.
train_metrics (
dict[str,Callable] |None) – torch.metrics dict to be logged during training.val_metrics (
dict[str,Callable] |None) – torch.metrics dict to be logged during validation.dataloader_kwargs (dict, default={}) – The kwargs for the pytorch dataloader class.
rebuild (bool, default=True) – Whether to rebuild the model when it already was built.
class_weight (
str|dict|list|ndarray|None) – Weights associated with classes for imbalanced data."balanced"mirrors scikit-learn and usesn_samples / (n_classes * bincount(y))so under-represented classes contribute more to the loss. A mapping{class_label: weight}or an array (ordered likenp.unique(y)) sets weights explicitly. For binary targets the weights are converted to apos_weightforBCEWithLogitsLoss; for multiclass they become theweightofCrossEntropyLoss. Ignored whenloss_fctis annn.Module.loss_fct (nn.Module, str, or None, default=None) – Custom loss. An
nn.Moduleis used as-is; a registered loss name (e.g."focal","bce","cross_entropy") is built and combined withclass_weight(seedeeptab.training.losses.build_classification_loss()).Nonefalls back to the default (weighted) task loss.balanced_sampler (
bool) – IfTrue, draw class-balanced mini-batches with aWeightedRandomSampler(oversamples minority classes). This rebalances the data instead of (or in addition to) reweighting the loss.sample_weight (array-like, optional) – Explicit per-row sampling weights (length matches
X). Takes precedence overbalanced_sampler; rows are drawn into batches in proportion to their weight.**trainer_kwargs (Additional keyword arguments for PyTorch Lightning's Trainer class.)
- Returns:
self – The fitted classifier.
- Return type:
object
- get_number_of_params(requires_grad=True)
Calculate the number of parameters in the model.
- Parameters:
requires_grad (bool, optional) – If True, only count the parameters that require gradients (trainable parameters). If False, count all parameters. Default is True.
- Returns:
The total number of parameters in the model.
- Return type:
int
- Raises:
ValueError – If the model has not been built prior to calling this method.
- get_params(deep=True)
Get parameters for this estimator.
- classmethod load(path)
Load and return a fitted model from path.
- Parameters:
path (
str) – Path to a file previously written bysave().- Returns:
A fully reconstructed, ready-to-predict estimator of the same type that was saved.
- Return type:
estimator
Examples
>>> loaded = MLPClassifier.load("my_model.deeptab") >>> predictions = loaded.predict(X_test) >>> print(loaded.task_info_["task"]) 'classification' >>> print(loaded.n_features_in_) 6
- optimize_hparams(X, y, X_val=None, y_val=None, embeddings=None, embeddings_val=None, time=100, max_epochs=200, prune_by_epoch=True, prune_epoch=5, fixed_params={'cat_encoding': 'int', 'head_layer_size_length': 0, 'head_skip_layer': False, 'head_skip_layers': False, 'pooling_method': 'avg', 'use_cls': False}, custom_search_space=None, **optimize_kwargs)
Optimizes hyperparameters using Bayesian optimization with optional pruning.
- Parameters:
X (array-like) – Training data.
y (array-like) – Training labels.
X_val (array-like, optional) – Validation data and labels.
y_val (array-like, optional) – Validation data and labels.
time (int) – The number of optimization trials to run.
max_epochs (int) – Maximum number of epochs for training.
prune_by_epoch (bool) – Whether to prune based on a specific epoch (True) or the best validation loss (False).
prune_epoch (int) – The specific epoch to prune by when prune_by_epoch is True.
**optimize_kwargs (dict) – Additional keyword arguments passed to the fit method.
- Returns:
best_hparams – Best hyperparameters found during optimization.
- Return type:
list
- parameter_table(trainable_only=False)
Return one row per model parameter as a pandas DataFrame.
- Parameters:
trainable_only (
bool) – If True, include only parameters withrequires_grad=True.- Return type:
DataFrame
- predict(X, embeddings=None, device=None)
Predicts target labels for the given input samples.
- Parameters:
X (DataFrame or array-like, shape (n_samples, n_features)) – The input samples for which to predict target values.
- Returns:
predictions – The predicted class labels.
- Return type:
ndarray, shape (n_samples,)
- predict_proba(X, embeddings=None, device=None)
Predicts class probabilities for the given input samples.
- Parameters:
X (DataFrame or array-like, shape (n_samples, n_features)) – The input samples for which to predict class probabilities.
- Returns:
probabilities – The predicted class probabilities.
- Return type:
ndarray, shape (n_samples, n_classes)
- pretrain(pretrain_epochs=15, k_neighbors=10, temperature=0.1, save_path='pretrained_embeddings.pth', lr=0.001, use_positive=True, use_negative=False, pool_sequence=True)
Pretrains the embedding layer of the model using a contrastive learning approach.
This method performs pretraining by optimizing the embeddings with respect to neighborhood structure in the feature space. The embeddings are saved after training.
- Parameters:
pretrain_epochs (int, default=15) – Number of epochs to run pretraining.
k_neighbors (int, default=10) – Number of neighbors used in the contrastive loss computation.
temperature (float, default=0.1) – Temperature parameter for contrastive loss scaling.
save_path (str, default="pretrained_embeddings.pth") – Path to save the pretrained embeddings.
lr (float, default=1e-3) – Learning rate for the pretraining optimizer.
use_positive (bool, default=True) – Whether to include positive pairs in contrastive learning.
use_negative (bool, default=False) – Whether to include negative pairs in contrastive learning.
pool_sequence (bool, default=True) – Whether to apply sequence pooling before computing contrastive loss.
- Raises:
ValueError – If the model has not been built before calling this method.
ValueError – If the model does not contain an embedding layer.
Notes
This function requires that
self.build_model()has been called beforehand.The pretraining method uses
self.task_model.estimator.embedding_layer.The method invokes
super()._pretrain()with regression mode enabled.
- profile(X, y, dry_run=True, n_forward_passes=3, batch_size=None, random_state=0)
Build the model on a small data sample and run a dry forward pass.
Combines
describe(),runtime_info(), and a timed forward pass to give a complete pre-training picture without running any gradient updates.- Parameters:
X (DataFrame or array-like) – Feature matrix. The first
min(256, len(X))rows are used for the dry-run build.y (array-like) – Target vector aligned with X.
dry_run (
bool) – WhenTruethe temporary model is discarded after profiling so the estimator’s state is left unchanged (unless the model was already built, in which case the existing model is used directly).n_forward_passes (
int) – Number of forward passes used to estimate per-batch runtime. The median is reported to reduce noise.batch_size (
int|None) – Override the batch size used for timing. Defaults to the value intrainer_configor 64.random_state (
int) – Seed passed to the dry-run build for reproducibility.
- Returns:
Keys:
buildsTrueif the model constructed without error.errorException message when
buildsisFalse, elseNone.deviceDevice string (e.g.
"cpu","mps:0","cuda:0").dtypeParameter dtype string (e.g.
"float32").total_paramsTotal number of model parameters.
trainable_paramsNumber of trainable parameters.
memory_mbEstimated parameter memory in megabytes.
batch_shapeShape of the first dummy batch drawn from the data module.
output_shapeShape of the model output for that dummy batch (
Noneon error).loss_fctClass name of the loss function.
forward_ms_medianMedian forward-pass wall time in milliseconds (
Noneon error).forward_ms_minMinimum forward-pass wall time in milliseconds (
Noneon error).describeFull
describe()dict (populated after build).runtimeFull
runtime_info()dict (populated after build).
- Return type:
dict[str,Any]
- runtime_info()
Return runtime setup information for the estimator.
The method is safe to call before fitting. Device and dtype are inferred from model parameters when a model has been built.
- Return type:
dict[str,Any]
- save(path=None)
Save the fitted model to path.
The bundle written by this method can be restored with
load(). It contains all state required for inference: architecture/config, neural-network weights, fitted preprocessing state, feature schema, column order, task metadata, classifier classes (when available), and package versions for debugging reloads across environments.- Parameters:
path (
str|None) – Destination file path (e.g."model.pt"). WhenNoneand a run directory is active (i.e.configure_observabilitywas called with a config that creates a run dir), the model is saved to<run_dir>/artifacts/model.deeptabautomatically. When no run dir is active either, raisesValueError.- Returns:
The resolved path the bundle was written to.
- Return type:
str- Raises:
ValueError – If the model has not been fitted yet, or path is
Noneand no run directory is active.
Examples
>>> model = MLPClassifier() >>> model.fit(X_train, y_train) >>> saved_path = model.save("my_model.deeptab") >>> loaded = MLPClassifier.load(saved_path) >>> predictions = loaded.predict(X_test)
- score(X, y, embeddings=None, metric=None)
Calculate the score of the model using the specified metric.
- Parameters:
X (array-like or pd.DataFrame of shape (n_samples, n_features)) – The input samples to predict.
y (array-like of shape (n_samples,)) – The true class labels against which to evaluate the predictions.
metric (tuple or callable, optional) – A tuple containing the metric function and a boolean indicating whether the metric requires probability scores (True) or class labels (False). If omitted, accuracy is used to match scikit-learn classifier behavior.
- Returns:
score – The score calculated using the specified metric.
- Return type:
float
- set_params(**parameters)
Set the parameters of this estimator.
- summary()
Return a compact human-readable model summary.
- Return type:
str
- property task_model
The fitted Lightning task model, or
Nonebefore fitting.This exposes the underlying
TaskModel(which holds the architecture viatask_model.estimatorand the loss viatask_model.loss_fct) as a stable, public read-only attribute.
- class deeptab.models.TabTransformerRegressor(model_config=None, preprocessing_config=None, trainer_config=None, observability_config=None, random_state=None)[source]
TabTransformer regressor. This class extends the SklearnBaseRegressor class and uses the TabTransformer model with the default TabTransformer configuration.
- Parameters:
model_config (TabTransformerConfig, optional) – Architecture hyperparameters for the model. If
None, a defaultTabTransformerConfigis used. See that class for the full list of available fields.preprocessing_config (PreprocessingConfig, optional) – Feature preprocessing settings such as scaling, encoding, and numerical embeddings. If
None, defaults fromPreprocessingConfigare used.trainer_config (TrainerConfig, optional) – Training-loop settings such as epochs, batch size, learning rate, and early stopping. If
None, defaults fromTrainerConfigare used.observability_config (ObservabilityConfig, optional) – Optional logging, experiment tracking, and run-directory settings (
deeptab.core.observability.ObservabilityConfig). IfNone, observability is disabled and the estimator emits nothing.random_state (int, optional) – Seed for reproducible weight initialisation and data shuffling.
Examples
>>> from deeptab.models import TabTransformerRegressor >>> model = TabTransformerRegressor() >>> model.fit(X_train, y_train) >>> preds = model.predict(X_test) >>> model.evaluate(X_test, y_test)- build_model(X, y, val_size=0.2, X_val=None, y_val=None, embeddings=None, embeddings_val=None, random_state=101, batch_size=128, shuffle=True, lr=None, lr_patience=None, lr_factor=None, weight_decay=None, train_metrics=None, val_metrics=None, dataloader_kwargs={})
Builds the model using the provided training data.
- Parameters:
X (DataFrame or array-like, shape (n_samples, n_features)) – The training input samples.
y (array-like, shape (n_samples,) or (n_samples, n_targets)) – The target values (real numbers).
val_size (
float) – The proportion of the dataset to include in the validation split ifX_valis None. Ignored ifX_valis provided.X_val (DataFrame or array-like, shape (n_samples, n_features), optional) – The validation input samples. If provided,
Xandyare not split and this data is used for validation.y_val (array-like, shape (n_samples,) or (n_samples, n_targets), optional) – The validation target values. Required if
X_valis provided.random_state (
int) – Controls the shuffling applied to the data before applying the split.batch_size (
int) – Number of samples per gradient update.shuffle (
bool) – Whether to shuffle the training data before each epoch.lr (
float|None) – Learning rate for the optimizer.lr_patience (
int|None) – Number of epochs with no improvement on the validation loss to wait before reducing the learning rate.factor (float, default=0.1) – Factor by which the learning rate will be reduced.
weight_decay (
float|None) – Weight decay (L2 penalty) coefficient.train_metrics (
dict[str,Callable] |None) – torch.metrics dict to be logged during training.val_metrics (
dict[str,Callable] |None) – torch.metrics dict to be logged during validation.dataloader_kwargs (dict, default={}) – The kwargs for the pytorch dataloader class.
- Returns:
self – The built regressor.
- Return type:
object
- property config
The instantiated model config object backing this estimator.
Stored on the private
_configattribute so it stays out of sklearn’sget_params/__init__introspection (it is derived frommodel_config/_model_clsrather than a constructor parameter), while remaining readable and settable asestimator.config.
- configure_observability(config)
Wire up logging backends described by config.
Can be called at any point — before or after
fit(). Changes take effect on the next lifecycle event emitted (i.e. the nextfit()orpredict()call).- Parameters:
config (
ObservabilityConfig) – Observability settings. Imports optional dependencies lazily; raisesImportErrorwith install hints if they are absent.- Return type:
None
- describe()
Return a structured description of the estimator and fitted model.
The method is safe to call before fitting. Parameter counts and feature metadata are included only after the model has been built.
- Return type:
dict[str,Any]
- encode(X, embeddings=None, batch_size=64)
Return dense embedding vectors from the model backbone.
Runs the fitted model’s
encodemethod on batches of X and concatenates the results into a single tensor.- Parameters:
X (array-like or DataFrame of shape (n_samples, n_features)) – Input features to encode.
embeddings (array-like or None, optional) – Pre-computed external embeddings aligned with the rows of X.
batch_size (int, default=64) – Number of samples processed in each forward pass.
- Returns:
Encoded representations of the input data.
- Return type:
torch.Tensor of shape (n_samples, embedding_dim)
- Raises:
ValueError – If the model has not been fitted yet.
Examples
>>> clf = MLPClassifier() >>> clf.fit(X_train, y_train) >>> embeddings = clf.encode(X_test) # (n_samples, embedding_dim) >>> embeddings.shape torch.Size([100, 64])
- evaluate(X, y_true, embeddings=None, metrics=None)
Evaluate the model on the given data using specified metrics.
- Parameters:
X (array-like or pd.DataFrame of shape (n_samples, n_features)) – The input samples to predict.
y_true (array-like of shape (n_samples,) or (n_samples, n_outputs)) – The true target values against which to evaluate the predictions.
metrics (dict) – A dictionary where keys are metric names and values are the metric functions.
Notes
This method uses the
predictmethod to generate predictions and computes each metric.- Returns:
scores – A dictionary with metric names as keys and their corresponding scores as values.
- Return type:
dict
- fit(X, y, val_size=0.2, X_val=None, y_val=None, embeddings=None, embeddings_val=None, max_epochs=100, random_state=101, batch_size=128, shuffle=True, patience=15, monitor='val_loss', mode='min', lr=None, lr_patience=None, lr_factor=None, weight_decay=None, checkpoint_path='model_checkpoints', dataloader_kwargs={}, train_metrics=None, val_metrics=None, rebuild=True, **trainer_kwargs)
Trains the regression model using the provided training data. Optionally, a separate validation set can be used.
- Parameters:
X (DataFrame or array-like, shape (n_samples, n_features)) – The training input samples.
y (array-like, shape (n_samples,) or (n_samples, n_targets)) – The target values (real numbers).
val_size (
float) – The proportion of the dataset to include in the validation split ifX_valis None. Ignored ifX_valis provided.X_val (DataFrame or array-like, shape (n_samples, n_features), optional) – The validation input samples. If provided,
Xandyare not split and this data is used for validation.y_val (array-like, shape (n_samples,) or (n_samples, n_targets), optional) – The validation target values. Required if
X_valis provided.max_epochs (
int) – Maximum number of epochs for training.random_state (
int) – Controls the shuffling applied to the data before applying the split.batch_size (
int) – Number of samples per gradient update.shuffle (
bool) – Whether to shuffle the training data before each epoch.patience (
int) – Number of epochs with no improvement on the validation loss to wait before early stopping.monitor (
str) – The metric to monitor for early stopping.mode (
str) – Whether the monitored metric should be minimized (min) or maximized (max).lr (
float|None) – Learning rate for the optimizer.lr_patience (
int|None) – Number of epochs with no improvement on the validation loss to wait before reducing the learning rate.factor (float, default=0.1) – Factor by which the learning rate will be reduced.
weight_decay (
float|None) – Weight decay (L2 penalty) coefficient.checkpoint_path (str, default="model_checkpoints") – Path where the checkpoints are being saved.
dataloader_kwargs (dict, default={}) – The kwargs for the pytorch dataloader class.
train_metrics (
dict[str,Callable] |None) – torch.metrics dict to be logged during training.val_metrics (
dict[str,Callable] |None) – torch.metrics dict to be logged during validation.rebuild (bool, default=True) – Whether to rebuild the model when it already was built.
**trainer_kwargs (Additional keyword arguments for PyTorch Lightning's Trainer class.)
- Returns:
self – The fitted regressor.
- Return type:
object
- get_number_of_params(requires_grad=True)
Calculate the number of parameters in the model.
- Parameters:
requires_grad (bool, optional) – If True, only count the parameters that require gradients (trainable parameters). If False, count all parameters. Default is True.
- Returns:
The total number of parameters in the model.
- Return type:
int
- Raises:
ValueError – If the model has not been built prior to calling this method.
- get_params(deep=True)
Get parameters for this estimator.
- classmethod load(path)
Load and return a fitted model from path.
- Parameters:
path (
str) – Path to a file previously written bysave().- Returns:
A fully reconstructed, ready-to-predict estimator of the same type that was saved.
- Return type:
estimator
Examples
>>> loaded = MLPClassifier.load("my_model.deeptab") >>> predictions = loaded.predict(X_test) >>> print(loaded.task_info_["task"]) 'classification' >>> print(loaded.n_features_in_) 6
- optimize_hparams(X, y, X_val=None, y_val=None, embeddings=None, embeddings_val=None, time=100, max_epochs=200, prune_by_epoch=True, prune_epoch=5, fixed_params={'cat_encoding': 'int', 'head_layer_size_length': 0, 'head_skip_layer': False, 'head_skip_layers': False, 'pooling_method': 'avg', 'use_cls': False}, custom_search_space=None, **optimize_kwargs)
Optimizes hyperparameters using Bayesian optimization with optional pruning.
- Parameters:
X (array-like) – Training data.
y (array-like) – Training labels.
X_val (array-like, optional) – Validation data and labels.
y_val (array-like, optional) – Validation data and labels.
time (int) – The number of optimization trials to run.
max_epochs (int) – Maximum number of epochs for training.
prune_by_epoch (bool) – Whether to prune based on a specific epoch (True) or the best validation loss (False).
prune_epoch (int) – The specific epoch to prune by when prune_by_epoch is True.
**optimize_kwargs (dict) – Additional keyword arguments passed to the fit method.
- Returns:
best_hparams – Best hyperparameters found during optimization.
- Return type:
list
- parameter_table(trainable_only=False)
Return one row per model parameter as a pandas DataFrame.
- Parameters:
trainable_only (
bool) – If True, include only parameters withrequires_grad=True.- Return type:
DataFrame
- predict(X, embeddings=None, device=None)
Predicts target values for the given input samples.
- Parameters:
X (DataFrame or array-like, shape (n_samples, n_features)) – The input samples for which to predict target values.
- Returns:
predictions – The predicted target values.
- Return type:
ndarray, shape (n_samples,) or (n_samples, n_outputs)
- pretrain(pretrain_epochs=15, k_neighbors=10, temperature=0.1, save_path='pretrained_embeddings.pth', lr=0.001, use_positive=True, use_negative=False, pool_sequence=True)
Pretrains the embedding layer of the model using a contrastive learning approach.
This method performs pretraining by optimizing the embeddings with respect to neighborhood structure in the feature space. The embeddings are saved after training.
- Parameters:
pretrain_epochs (int, default=15) – Number of epochs to run pretraining.
k_neighbors (int, default=10) – Number of neighbors used in the contrastive loss computation.
temperature (float, default=0.1) – Temperature parameter for contrastive loss scaling.
save_path (str, default="pretrained_embeddings.pth") – Path to save the pretrained embeddings.
lr (float, default=1e-3) – Learning rate for the pretraining optimizer.
use_positive (bool, default=True) – Whether to include positive pairs in contrastive learning.
use_negative (bool, default=False) – Whether to include negative pairs in contrastive learning.
pool_sequence (bool, default=True) – Whether to apply sequence pooling before computing contrastive loss.
- Raises:
ValueError – If the model has not been built before calling this method.
ValueError – If the model does not contain an embedding layer.
Notes
This function requires that
self.build_model()has been called beforehand.The pretraining method uses
self.task_model.estimator.embedding_layer.The method invokes
super()._pretrain()with regression mode enabled.
- profile(X, y, dry_run=True, n_forward_passes=3, batch_size=None, random_state=0)
Build the model on a small data sample and run a dry forward pass.
Combines
describe(),runtime_info(), and a timed forward pass to give a complete pre-training picture without running any gradient updates.- Parameters:
X (DataFrame or array-like) – Feature matrix. The first
min(256, len(X))rows are used for the dry-run build.y (array-like) – Target vector aligned with X.
dry_run (
bool) – WhenTruethe temporary model is discarded after profiling so the estimator’s state is left unchanged (unless the model was already built, in which case the existing model is used directly).n_forward_passes (
int) – Number of forward passes used to estimate per-batch runtime. The median is reported to reduce noise.batch_size (
int|None) – Override the batch size used for timing. Defaults to the value intrainer_configor 64.random_state (
int) – Seed passed to the dry-run build for reproducibility.
- Returns:
Keys:
buildsTrueif the model constructed without error.errorException message when
buildsisFalse, elseNone.deviceDevice string (e.g.
"cpu","mps:0","cuda:0").dtypeParameter dtype string (e.g.
"float32").total_paramsTotal number of model parameters.
trainable_paramsNumber of trainable parameters.
memory_mbEstimated parameter memory in megabytes.
batch_shapeShape of the first dummy batch drawn from the data module.
output_shapeShape of the model output for that dummy batch (
Noneon error).loss_fctClass name of the loss function.
forward_ms_medianMedian forward-pass wall time in milliseconds (
Noneon error).forward_ms_minMinimum forward-pass wall time in milliseconds (
Noneon error).describeFull
describe()dict (populated after build).runtimeFull
runtime_info()dict (populated after build).
- Return type:
dict[str,Any]
- runtime_info()
Return runtime setup information for the estimator.
The method is safe to call before fitting. Device and dtype are inferred from model parameters when a model has been built.
- Return type:
dict[str,Any]
- save(path=None)
Save the fitted model to path.
The bundle written by this method can be restored with
load(). It contains all state required for inference: architecture/config, neural-network weights, fitted preprocessing state, feature schema, column order, task metadata, classifier classes (when available), and package versions for debugging reloads across environments.- Parameters:
path (
str|None) – Destination file path (e.g."model.pt"). WhenNoneand a run directory is active (i.e.configure_observabilitywas called with a config that creates a run dir), the model is saved to<run_dir>/artifacts/model.deeptabautomatically. When no run dir is active either, raisesValueError.- Returns:
The resolved path the bundle was written to.
- Return type:
str- Raises:
ValueError – If the model has not been fitted yet, or path is
Noneand no run directory is active.
Examples
>>> model = MLPClassifier() >>> model.fit(X_train, y_train) >>> saved_path = model.save("my_model.deeptab") >>> loaded = MLPClassifier.load(saved_path) >>> predictions = loaded.predict(X_test)
- score(X, y, embeddings=None, metric=<function r2_score>)
Calculate the score of the model using the specified metric.
- Parameters:
X (array-like or pd.DataFrame of shape (n_samples, n_features)) – The input samples to predict.
y (array-like of shape (n_samples,) or (n_samples, n_outputs)) – The true target values against which to evaluate the predictions.
metric (callable, default=r2_score) – The metric function to use for evaluation. Must be a callable with the signature
metric(y_true, y_pred). Defaults tor2_scoreto match scikit-learn’sRegressorMixinconvention (higher is better).
- Returns:
score – The score calculated using the specified metric.
- Return type:
float
Examples
>>> from sklearn.metrics import mean_squared_error, mean_absolute_error >>> model.score(X_test, y_test) # R² (default) >>> model.score(X_test, y_test, metric=mean_squared_error) # MSE >>> model.score(X_test, y_test, metric=mean_absolute_error) # MAE
- set_params(**parameters)
Set the parameters of this estimator.
- summary()
Return a compact human-readable model summary.
- Return type:
str
- property task_model
The fitted Lightning task model, or
Nonebefore fitting.This exposes the underlying
TaskModel(which holds the architecture viatask_model.estimatorand the loss viatask_model.loss_fct) as a stable, public read-only attribute.
- class deeptab.models.TabTransformerLSS(model_config=None, preprocessing_config=None, trainer_config=None, observability_config=None, random_state=None)[source]
TabTransformer for distributional regression. This class extends the SklearnBaseLSS class and uses the TabTransformer model with the default TabTransformer configuration.
- Parameters:
model_config (TabTransformerConfig, optional) – Architecture hyperparameters for the model. If
None, a defaultTabTransformerConfigis used. See that class for the full list of available fields.preprocessing_config (PreprocessingConfig, optional) – Feature preprocessing settings such as scaling, encoding, and numerical embeddings. If
None, defaults fromPreprocessingConfigare used.trainer_config (TrainerConfig, optional) – Training-loop settings such as epochs, batch size, learning rate, and early stopping. If
None, defaults fromTrainerConfigare used.observability_config (
ObservabilityConfig|None) – Optional logging, experiment tracking, and run-directory settings (deeptab.core.observability.ObservabilityConfig). IfNone, observability is disabled and the estimator emits nothing.random_state (int, optional) – Seed for reproducible weight initialisation and data shuffling.
Examples
>>> from deeptab.models import TabTransformerLSS >>> model = TabTransformerLSS() >>> model.fit(X_train, y_train, family='normal') >>> preds = model.predict(X_test) >>> model.evaluate(X_test, y_test)- build_model(X, y, val_size=0.2, X_val=None, y_val=None, random_state=101, batch_size=128, shuffle=True, lr=None, lr_patience=None, lr_factor=None, weight_decay=None, train_metrics=None, val_metrics=None, dataloader_kwargs={})
Builds the model using the provided training data.
- Parameters:
X (DataFrame or array-like, shape (n_samples, n_features)) – The training input samples.
y (array-like, shape (n_samples,) or (n_samples, n_targets)) – The target values (real numbers).
val_size (
float) – The proportion of the dataset to include in the validation split ifX_valis None. Ignored ifX_valis provided.X_val (DataFrame or array-like, shape (n_samples, n_features), optional) – The validation input samples. If provided,
Xandyare not split and this data is used for validation.y_val (array-like, shape (n_samples,) or (n_samples, n_targets), optional) – The validation target values. Required if
X_valis provided.random_state (
int) – Controls the shuffling applied to the data before applying the split.batch_size (
int) – Number of samples per gradient update.shuffle (
bool) – Whether to shuffle the training data before each epoch.lr (
float|None) – Learning rate for the optimizer.lr_patience (
int|None) – Number of epochs with no improvement on the validation loss to wait before reducing the learning rate.lr_factor (
float|None) – Factor by which the learning rate will be reduced.train_metrics (
dict[str,Callable] |None) – torch.metrics dict to be logged during training.val_metrics (
dict[str,Callable] |None) – torch.metrics dict to be logged during validation.weight_decay (
float|None) – Weight decay (L2 penalty) coefficient.dataloader_kwargs (dict, default={}) – The kwargs for the pytorch dataloader class.
- Returns:
self – The built distributional regressor.
- Return type:
object
- property config
The instantiated model config object backing this estimator.
Stored on the private
_configattribute so it stays out of sklearn’sget_params/__init__introspection (it is derived frommodel_config/_model_clsrather than a constructor parameter), while remaining readable and settable asestimator.config.
- configure_observability(config)
Wire up logging backends described by config.
Can be called at any point — before or after
fit(). Changes take effect on the next lifecycle event emitted (i.e. the nextfit()orpredict()call).- Parameters:
config (
ObservabilityConfig) – Observability settings. Imports optional dependencies lazily; raisesImportErrorwith install hints if they are absent.- Return type:
None
- describe()
Return a structured description of the estimator and fitted model.
The method is safe to call before fitting. Parameter counts and feature metadata are included only after the model has been built.
- Return type:
dict[str,Any]
- encode(X, batch_size=64)
Encodes input data using the trained model’s embedding layer.
- Parameters:
X (array-like or DataFrame) – Input data to be encoded.
batch_size (int, optional, default=64) – Batch size for encoding.
- Returns:
Encoded representations of the input data.
- Return type:
torch.Tensor
- Raises:
ValueError – If the model or data module is not fitted.
- evaluate(X, y_true, metrics=None, distribution_family=None)
Evaluate the model on the given data using specified metrics.
- Parameters:
X (array-like or pd.DataFrame of shape (n_samples, n_features)) – The input samples to predict.
y_true (array-like of shape (n_samples,)) – The true target values.
metrics (dict, optional) – A
{name: callable}dictionary of metric functions with signaturemetric(y_true, y_pred) -> float. Each callable may be aDeepTabMetricinstance or any plain callable. When a metric hasneeds_raw=True, raw model logits are passed instead of transformed distribution parameters. IfNone, the default metrics for the distribution family are used (seedeeptab.metrics.get_default_metrics()).distribution_family (str, optional) – Distribution family key (e.g.
"normal","gamma"). Inferred from the fitted model whenNone.
- Returns:
scores –
{metric_name: score}dictionary.- Return type:
dict
- fit(X, y, family, val_size=0.2, X_val=None, y_val=None, max_epochs=100, random_state=101, batch_size=128, shuffle=True, patience=15, monitor='val_loss', mode='min', lr=None, lr_patience=None, lr_factor=None, weight_decay=None, checkpoint_path='model_checkpoints', distributional_kwargs=None, train_metrics=None, val_metrics=None, dataloader_kwargs={}, rebuild=True, **trainer_kwargs)
Trains the regression model using the provided training data. Optionally, a separate validation set can be used.
- Parameters:
X (DataFrame or array-like, shape (n_samples, n_features)) – The training input samples.
y (array-like, shape (n_samples,) or (n_samples, n_targets)) – The target values (real numbers).
family (str) – The name of the distribution family to use for the loss function. Examples include ‘normal’ for regression tasks.
val_size (
float) – The proportion of the dataset to include in the validation split ifX_valis None. Ignored ifX_valis provided.X_val (DataFrame or array-like, shape (n_samples, n_features), optional) – The validation input samples. If provided,
Xandyare not split and this data is used for validation.y_val (array-like, shape (n_samples,) or (n_samples, n_targets), optional) – The validation target values. Required if
X_valis provided.max_epochs (
int) – Maximum number of epochs for training.random_state (
int) – Controls the shuffling applied to the data before applying the split.batch_size (
int) – Number of samples per gradient update.shuffle (
bool) – Whether to shuffle the training data before each epoch.patience (
int) – Number of epochs with no improvement on the validation loss to wait before early stopping.monitor (
str) – The metric to monitor for early stopping.mode (
str) – Whether the monitored metric should be minimized (min) or maximized (max).lr (
float|None) – Learning rate for the optimizer.lr_patience (
int|None) – Number of epochs with no improvement on the validation loss to wait before reducing the learning rate.factor (float, default=0.1) – Factor by which the learning rate will be reduced.
weight_decay (
float|None) – Weight decay (L2 penalty) coefficient.distributional_kwargs (dict, default=None) – any arguments taht are specific for a certain distribution.
train_metrics (
dict[str,Callable] |None) – torch.metrics dict to be logged during training.val_metrics (
dict[str,Callable] |None) – torch.metrics dict to be logged during validation.checkpoint_path (str, default="model_checkpoints") – Path where the checkpoints are being saved.
dataloader_kwargs (dict, default={}) – The kwargs for the pytorch dataloader class.
**trainer_kwargs (Additional keyword arguments for PyTorch Lightning's Trainer class.)
- Returns:
self – The fitted regressor.
- Return type:
object
- get_default_metrics(distribution_family)
Return default evaluation metrics for the given distribution family.
Delegates to
deeptab.metrics.get_default_metrics_dict(), which returns a{name: DeepTabMetric}dictionary covering all supported distribution families.- Parameters:
distribution_family (str) – Distribution family key, e.g.
"normal","gamma".- Returns:
{metric_name: callable}dictionary of metric functions.- Return type:
dict
- get_number_of_params(requires_grad=True)
Calculate the number of parameters in the model.
- Parameters:
requires_grad (bool, optional) – If True, only count the parameters that require gradients (trainable parameters). If False, count all parameters. Default is True.
- Returns:
The total number of parameters in the model.
- Return type:
int
- Raises:
ValueError – If the model has not been built prior to calling this method.
- get_params(deep=True)
Get parameters for this estimator.
- classmethod load(path)
Load and return a fitted model from path.
- Parameters:
path (
str) – Path to a file previously written bysave().- Returns:
A fully reconstructed, ready-to-predict estimator. Exposes
artifact_metadata_,architecture_metadata_,feature_schema_,input_columns_,task_info_,classes_, andversions_attributes after loading.- Return type:
estimator
Examples
>>> loaded = MLPLSS.load("my_lss_model.deeptab") >>> predictions = loaded.predict(X_test) >>> print(loaded.task_info_["family"]) 'normal'
- optimize_hparams(X, y, X_val=None, y_val=None, time=100, max_epochs=200, prune_by_epoch=True, prune_epoch=5, fixed_params={'cat_encoding': 'int', 'head_layer_size_length': 0, 'head_skip_layer': False, 'head_skip_layers': False, 'pooling_method': 'avg', 'use_cls': False}, custom_search_space=None, **optimize_kwargs)
Optimizes hyperparameters using Bayesian optimization with optional pruning.
- Parameters:
X (array-like) – Training data.
y (array-like) – Training labels.
X_val (array-like, optional) – Validation data and labels.
y_val (array-like, optional) – Validation data and labels.
time (int) – The number of optimization trials to run.
max_epochs (int) – Maximum number of epochs for training.
prune_by_epoch (bool) – Whether to prune based on a specific epoch (True) or the best validation loss (False).
prune_epoch (int) – The specific epoch to prune by when prune_by_epoch is True.
**optimize_kwargs (dict) – Additional keyword arguments passed to the fit method.
- Returns:
best_hparams – Best hyperparameters found during optimization.
- Return type:
list
- parameter_table(trainable_only=False)
Return one row per model parameter as a pandas DataFrame.
- Parameters:
trainable_only (
bool) – If True, include only parameters withrequires_grad=True.- Return type:
DataFrame
- predict(X, raw=False, device=None)
Predicts target values for the given input samples.
- Parameters:
X (DataFrame or array-like, shape (n_samples, n_features)) – The input samples for which to predict target values.
- Returns:
predictions – The predicted target values.
- Return type:
ndarray, shape (n_samples,) or (n_samples, n_outputs)
- profile(X, y, dry_run=True, n_forward_passes=3, batch_size=None, random_state=0)
Build the model on a small data sample and run a dry forward pass.
Combines
describe(),runtime_info(), and a timed forward pass to give a complete pre-training picture without running any gradient updates.- Parameters:
X (DataFrame or array-like) – Feature matrix. The first
min(256, len(X))rows are used for the dry-run build.y (array-like) – Target vector aligned with X.
dry_run (
bool) – WhenTruethe temporary model is discarded after profiling so the estimator’s state is left unchanged (unless the model was already built, in which case the existing model is used directly).n_forward_passes (
int) – Number of forward passes used to estimate per-batch runtime. The median is reported to reduce noise.batch_size (
int|None) – Override the batch size used for timing. Defaults to the value intrainer_configor 64.random_state (
int) – Seed passed to the dry-run build for reproducibility.
- Returns:
Keys:
buildsTrueif the model constructed without error.errorException message when
buildsisFalse, elseNone.deviceDevice string (e.g.
"cpu","mps:0","cuda:0").dtypeParameter dtype string (e.g.
"float32").total_paramsTotal number of model parameters.
trainable_paramsNumber of trainable parameters.
memory_mbEstimated parameter memory in megabytes.
batch_shapeShape of the first dummy batch drawn from the data module.
output_shapeShape of the model output for that dummy batch (
Noneon error).loss_fctClass name of the loss function.
forward_ms_medianMedian forward-pass wall time in milliseconds (
Noneon error).forward_ms_minMinimum forward-pass wall time in milliseconds (
Noneon error).describeFull
describe()dict (populated after build).runtimeFull
runtime_info()dict (populated after build).
- Return type:
dict[str,Any]
- runtime_info()
Return runtime setup information for the estimator.
The method is safe to call before fitting. Device and dtype are inferred from model parameters when a model has been built.
- Return type:
dict[str,Any]
- save(path)
Save the fitted model to path.
The bundle written by this method can be restored with
load(). It contains all state required for inference: the architecture/config, neural-network weights, fitted preprocessing state, feature schema and column order, task metadata, distribution family, classifier classes for categorical LSS models, and package versions for debugging reloads across environments.The bundle is built by
build_save_bundle(), which is the single source of truth for artifact structure across all model variants.- Parameters:
path (
str) – Destination file path (e.g."model.pt").- Raises:
ValueError – If the model has not been fitted yet.
- Return type:
None
Examples
>>> model = MLPLSS() >>> model.fit(X_train, y_train, family="normal") >>> model.save("my_lss_model.deeptab") >>> loaded = MLPLSS.load("my_lss_model.deeptab") >>> predictions = loaded.predict(X_test)
- score(X, y, metric='NLL')
Calculate the score of the model using the specified metric.
- Parameters:
X (array-like or pd.DataFrame of shape (n_samples, n_features)) – The input samples to predict.
y (array-like of shape (n_samples,) or (n_samples, n_outputs)) – The true target values against which to evaluate the predictions.
metric (str, default="NLL") – So far, only negative log-likelihood is supported
- Returns:
score – The score calculated using the specified metric.
- Return type:
float
- set_params(**parameters)
Set the parameters of this estimator.
- summary()
Return a compact human-readable model summary.
- Return type:
str
- property task_model
The fitted Lightning task model, or
Nonebefore fitting.This exposes the underlying
TaskModel(which holds the architecture viatask_model.estimatorand the loss viatask_model.loss_fct) as a stable, public read-only attribute.