Configurations API

Base configs

These three classes form the core of the split-config API and are shared across all models.

class deeptab.configs.TrainerConfig(max_epochs=100, batch_size=128, val_size=0.2, shuffle=True, stratify=True, patience=15, monitor='val_loss', mode='min', lr=0.0001, lr_patience=10, lr_factor=0.1, weight_decay=1e-06, optimizer_type='Adam', optimizer_kwargs=None, scheduler_type='ReduceLROnPlateau', scheduler_kwargs=None, scheduler_monitor=None, scheduler_interval='epoch', scheduler_frequency=1, no_weight_decay_for_bias_and_norm=False, checkpoint_path='model_checkpoints')[source]

Configuration for training loop, optimizer, and runtime execution.

These settings are entirely separate from model architecture. They control how a model is trained and executed, not what the model is.

Parameters:
  • max_epochs (int) – Maximum number of training epochs.

  • batch_size (int) – Number of samples per gradient update.

  • val_size (float) – Fraction of the training data held out for validation when no explicit validation set is provided.

  • shuffle (bool) – Whether to shuffle training data before each epoch.

  • stratify (bool) – Whether to stratify the validation split on y for classification tasks, so the train and validation sets keep the same class proportions. Has no effect on regression, where a continuous target cannot be stratified. Set to False to draw a purely random split.

  • patience (int) – Number of epochs with no improvement on monitor before early stopping is triggered.

  • monitor (str) – Metric name to monitor for early stopping and checkpoint selection.

  • mode (str) – Whether the monitored metric should be minimised ("min") or maximised ("max").

  • lr (float) – Learning rate for the optimizer.

  • lr_patience (int) – Number of epochs with no improvement before the learning rate is reduced by lr_factor.

  • lr_factor (float) – Multiplicative factor applied to the learning rate when patience is exceeded.

  • weight_decay (float) – L2 regularisation coefficient (weight decay) for the optimizer.

  • optimizer_type (str) – Optimizer class name. Must be a valid torch.optim class name or a name registered in the project’s optimizer registry.

  • optimizer_kwargs (dict | None) – Extra keyword arguments forwarded to the optimizer constructor.

  • scheduler_type (str | None) – LR-scheduler class name (case-insensitive), or None / "none" to disable the scheduler entirely.

  • scheduler_kwargs (dict | None) – Extra keyword arguments forwarded to the scheduler constructor. factor and patience are synthesised from lr_factor and lr_patience for ReduceLROnPlateau when absent here.

  • scheduler_monitor (str | None) – Metric name for the scheduler to monitor. Falls back to the value of monitor when None.

  • scheduler_interval (str) – Lightning scheduling granularity: "epoch" or "step".

  • scheduler_frequency (int) – How often the scheduler steps at the given interval.

  • no_weight_decay_for_bias_and_norm (bool) – When True, bias vectors and normalisation-layer scale/shift parameters receive zero weight decay. Recommended for transformer- style models with LayerNorm.

  • checkpoint_path (str) – Directory where PyTorch Lightning model checkpoints are saved.

batch_size: int = 128
checkpoint_path: str = 'model_checkpoints'
get_params(deep=True)

Get parameters for this estimator.

Parameters:

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

params – Parameter names mapped to their values.

Return type:

dict

lr: float = 0.0001
lr_factor: float = 0.1
lr_patience: int = 10
max_epochs: int = 100
mode: str = 'min'
monitor: str = 'val_loss'
no_weight_decay_for_bias_and_norm: bool = False
optimizer_kwargs: dict | None = None
optimizer_type: str = 'Adam'
patience: int = 15
scheduler_frequency: int = 1
scheduler_interval: str = 'epoch'
scheduler_kwargs: dict | None = None
scheduler_monitor: str | None = None
scheduler_type: str | None = 'ReduceLROnPlateau'
set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:

**params (dict) – Estimator parameters.

Returns:

self – Estimator instance.

Return type:

estimator instance

shuffle: bool = True
stratify: bool = True
val_size: float = 0.2
weight_decay: float = 1e-06
class deeptab.configs.PreprocessingConfig(numerical_preprocessing=None, categorical_preprocessing=None, n_bins=None, feature_preprocessing=None, use_decision_tree_bins=None, binning_strategy=None, task=None, cat_cutoff=None, treat_all_integers_as_numerical=None, degree=None, scaling_strategy=None, n_knots=None, use_decision_tree_knots=None, knots_strategy=None, spline_implementation=None)[source]

Configuration for input feature preprocessing.

All fields map directly to arguments accepted by pretab.preprocessor.Preprocessor. Using None for any field leaves the preprocessor default in effect.

Parameters:
  • numerical_preprocessing (str | None) – Strategy for transforming numerical features (e.g. "ple", "quantile", "standard"). None uses the preprocessor’s built-in default.

  • categorical_preprocessing (str | None) – Strategy for transforming categorical features (e.g. "int", "one-hot"). None uses the preprocessor’s built-in default.

  • n_bins (int | None) – Number of bins for numerical binning. None uses the preprocessor default.

  • feature_preprocessing (str | None) – General feature-level preprocessing override.

  • use_decision_tree_bins (bool | None) – Whether to use decision-tree-derived bin edges.

  • binning_strategy (str | None) – Strategy for choosing bin edges (e.g. "uniform", "quantile").

  • task (str | None) – Task type passed to the preprocessor for task-aware transformations (e.g. "regression", "classification").

  • cat_cutoff (float | None) – Threshold for treating integer columns as categorical.

  • treat_all_integers_as_numerical (bool | None) – When True, integer columns are never converted to categorical.

  • degree (int | None) – Polynomial / spline degree for numerical feature expansion.

  • scaling_strategy (str | None) – Scaling method applied to numerical features (e.g. "standard", "minmax", "robust").

  • n_knots (int | None) – Number of knots for spline preprocessing.

  • use_decision_tree_knots (bool | None) – Whether to use decision-tree-derived knot positions.

  • knots_strategy (str | None) – Strategy for knot placement.

  • spline_implementation (str | None) – Backend used for spline transformations.

binning_strategy: str | None = None
cat_cutoff: float | None = None
categorical_preprocessing: str | None = None
degree: int | None = None
feature_preprocessing: str | None = None
get_params(deep=True)

Get parameters for this estimator.

Parameters:

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

params – Parameter names mapped to their values.

Return type:

dict

knots_strategy: str | None = None
n_bins: int | None = None
n_knots: int | None = None
numerical_preprocessing: str | None = None
scaling_strategy: str | None = None
set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:

**params (dict) – Estimator parameters.

Returns:

self – Estimator instance.

Return type:

estimator instance

spline_implementation: str | None = None
task: str | None = None
to_preprocessor_kwargs()[source]

Return a dict of non-None fields suitable for passing to Preprocessor(**...).

Returns:

Mapping of field name → value for every field that is not None.

Return type:

dict

treat_all_integers_as_numerical: bool | None = None
use_decision_tree_bins: bool | None = None
use_decision_tree_knots: bool | None = None
class deeptab.configs.BaseModelConfig(use_embeddings=False, embedding_activation=Identity(), embedding_type='linear', embedding_bias=False, layer_norm_after_embedding=False, d_model=32, plr_lite=False, n_frequencies=48, frequencies_init_scale=0.01, embedding_projection=True, batch_norm=False, layer_norm=False, layer_norm_eps=1e-05, activation=ReLU(), cat_encoding='int')[source]

Shared architecture hyperparameters for all DeepTab models.

This class contains only architectural / structural configuration. Training-related parameters (lr, weight_decay, max_epochs, …) belong in TrainerConfig. Preprocessing parameters belong in PreprocessingConfig.

Parameters:
  • use_embeddings (bool) – Whether to use embedding layers for numerical/categorical features.

  • embedding_activation (Callable) – Activation function applied to embeddings.

  • embedding_type (str) – Type of embedding ("linear", "plr", etc.).

  • embedding_bias (bool) – Whether to add a bias term to embedding layers.

  • layer_norm_after_embedding (bool) – Whether to apply layer normalisation after the embedding layer.

  • d_model (int) – Embedding / model dimensionality.

  • plr_lite (bool) – Whether to use the lightweight PLR embedding variant.

  • n_frequencies (int) – Number of frequency components for PLR embeddings.

  • frequencies_init_scale (float) – Initial scale for PLR frequency components.

  • embedding_projection (bool) – Whether to apply a linear projection after embeddings.

  • batch_norm (bool) – Whether to use batch normalisation in the model body.

  • layer_norm (bool) – Whether to use layer normalisation in the model body.

  • layer_norm_eps (float) – Epsilon for layer normalisation numerical stability.

  • activation (Callable) – Activation function used throughout the model body.

  • cat_encoding (str) – How categorical features are encoded at the model input ("int", "one-hot", "linear").

batch_norm: bool = False
cat_encoding: str = 'int'
d_model: int = 32
embedding_bias: bool = False
embedding_projection: bool = True
embedding_type: str = 'linear'
frequencies_init_scale: float = 0.01
get_params(deep=True)

Get parameters for this estimator.

Parameters:

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

params – Parameter names mapped to their values.

Return type:

dict

layer_norm: bool = False
layer_norm_after_embedding: bool = False
layer_norm_eps: float = 1e-05
n_frequencies: int = 48
plr_lite: bool = False
set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:

**params (dict) – Estimator parameters.

Returns:

self – Estimator instance.

Return type:

estimator instance

use_embeddings: bool = False

Model architecture configs

Each class below extends BaseModelConfig and adds the hyperparameters specific to one model family.

class deeptab.configs.AutoIntConfig(use_embeddings=False, embedding_activation=Identity(), embedding_type='linear', embedding_bias=False, layer_norm_after_embedding=False, d_model=128, plr_lite=False, n_frequencies=48, frequencies_init_scale=0.01, embedding_projection=True, batch_norm=False, layer_norm=False, layer_norm_eps=1e-05, activation=ReLU(), cat_encoding='int', n_layers=4, n_heads=8, attn_dropout=0.2, transformer_dim_feedforward=256, fprenorm=False, bias=True, use_cls=False, kv_compression=0.5, kv_compression_sharing='key-value')[source]

Architecture-only configuration for AutoInt models (DeepTab 2.0 API).

Parameters:
  • d_model (int) – Dimensionality of the transformer model.

  • n_layers (int) – Number of transformer layers.

  • n_heads (int) – Number of attention heads in the transformer.

  • attn_dropout (float) – Dropout rate for the attention mechanism.

  • transformer_dim_feedforward (int) – Dimensionality of the feed-forward layers in the transformer.

  • fprenorm (bool) – Whether to apply pre-normalization in attention layers.

  • bias (bool) – Whether to use bias in linear layers.

  • use_cls (bool) – Whether to use a CLS token for pooling instead of averaging.

  • kv_compression (float) – Compression ratio for key-value pairs.

  • kv_compression_sharing (str) – Sharing strategy for key-value compression (‘headwise’, or ‘key- value’).

attn_dropout: float = 0.2
batch_norm: bool = False
bias: bool = True
cat_encoding: str = 'int'
d_model: int = 128
embedding_bias: bool = False
embedding_projection: bool = True
embedding_type: str = 'linear'
fprenorm: bool = False
frequencies_init_scale: float = 0.01
get_params(deep=True)

Get parameters for this estimator.

Parameters:

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

params – Parameter names mapped to their values.

Return type:

dict

kv_compression: float = 0.5
kv_compression_sharing: str = 'key-value'
layer_norm: bool = False
layer_norm_after_embedding: bool = False
layer_norm_eps: float = 1e-05
n_frequencies: int = 48
n_heads: int = 8
n_layers: int = 4
plr_lite: bool = False
set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:

**params (dict) – Estimator parameters.

Returns:

self – Estimator instance.

Return type:

estimator instance

transformer_dim_feedforward: int = 256
use_cls: bool = False
use_embeddings: bool = False
class deeptab.configs.ENODEConfig(use_embeddings=False, embedding_activation=Identity(), embedding_type='linear', embedding_bias=False, layer_norm_after_embedding=False, d_model=8, plr_lite=False, n_frequencies=48, frequencies_init_scale=0.01, embedding_projection=True, batch_norm=False, layer_norm=False, layer_norm_eps=1e-05, activation=ReLU(), cat_encoding='int', num_layers=4, layer_dim=64, tree_dim=1, depth=6, norm=None, head_layer_sizes=<factory>, head_dropout=0.3, head_skip_layers=False, head_activation=ReLU(), head_use_batch_norm=False)[source]

Architecture-only configuration for ENODE models (DeepTab 2.0 API).

Parameters:
  • d_model (int) – Hidden dimensionality used in the ENODE model.

  • activation (Callable) – Activation function for the internal ENODE layers.

  • num_layers (int) – Number of dense layers in the model.

  • layer_dim (int) – Dimensionality of each dense layer.

  • tree_dim (int) – Dimensionality of the output from each tree leaf.

  • depth (int) – Depth of each decision tree in the ensemble.

  • norm (str | None) – Type of normalization to use in the model.

  • head_layer_sizes (list) – Sizes of the layers in the model’s head.

  • head_dropout (float) – Dropout rate for the head layers.

  • head_skip_layers (bool) – Whether to skip layers in the head.

  • head_activation (Callable) – Activation function for the head layers.

  • head_use_batch_norm (bool) – Whether to use batch normalization in the head layers.

batch_norm: bool = False
cat_encoding: str = 'int'
d_model: int = 8
depth: int = 6
embedding_bias: bool = False
embedding_projection: bool = True
embedding_type: str = 'linear'
frequencies_init_scale: float = 0.01
get_params(deep=True)

Get parameters for this estimator.

Parameters:

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

params – Parameter names mapped to their values.

Return type:

dict

head_dropout: float = 0.3
head_layer_sizes: list
head_skip_layers: bool = False
head_use_batch_norm: bool = False
layer_dim: int = 64
layer_norm: bool = False
layer_norm_after_embedding: bool = False
layer_norm_eps: float = 1e-05
n_frequencies: int = 48
norm: str | None = None
num_layers: int = 4
plr_lite: bool = False
set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:

**params (dict) – Estimator parameters.

Returns:

self – Estimator instance.

Return type:

estimator instance

tree_dim: int = 1
use_embeddings: bool = False
class deeptab.configs.FTTransformerConfig(use_embeddings=False, embedding_activation=Identity(), embedding_type='linear', embedding_bias=False, layer_norm_after_embedding=False, d_model=128, plr_lite=False, n_frequencies=48, frequencies_init_scale=0.01, embedding_projection=True, batch_norm=False, layer_norm=False, layer_norm_eps=1e-05, activation=SELU(), cat_encoding='int', n_layers=4, n_heads=8, attn_dropout=0.2, ff_dropout=0.1, norm='LayerNorm', transformer_activation=ReGLU(), transformer_dim_feedforward=256, norm_first=False, bias=True, head_layer_sizes=<factory>, head_dropout=0.5, head_skip_layers=False, head_activation=SELU(), head_use_batch_norm=False, pooling_method='avg', use_cls=False)[source]

Architecture-only configuration for FTTransformer models (DeepTab 2.0 API).

Parameters:
  • d_model (int) – Dimensionality of the transformer model.

  • activation (Callable) – Activation function for the transformer layers.

  • n_layers (int) – Number of transformer layers.

  • n_heads (int) – Number of attention heads in the transformer.

  • attn_dropout (float) – Dropout rate for the attention mechanism.

  • ff_dropout (float) – Dropout rate for the feed-forward layers.

  • norm (str) – Type of normalization to be used (‘LayerNorm’, ‘RMSNorm’, etc.).

  • transformer_activation (Callable) – Activation function for the transformer feed-forward layers.

  • transformer_dim_feedforward (int) – Dimensionality of the feed-forward layers in the transformer.

  • norm_first (bool) – Whether to apply normalization before other operations in each transformer block.

  • bias (bool) – Whether to use bias in linear layers.

  • head_layer_sizes (list) – Sizes of the fully connected layers in the model’s head.

  • head_dropout (float) – Dropout rate for the head layers.

  • head_skip_layers (bool) – Whether to use skip connections in the head layers.

  • head_activation (Callable) – Activation function for the head layers.

  • head_use_batch_norm (bool) – Whether to use batch normalization in the head layers.

  • pooling_method (str) – Pooling method to be used (‘cls’, ‘avg’, etc.).

  • use_cls (bool) – Whether to use a CLS token for pooling.

attn_dropout: float = 0.2
batch_norm: bool = False
bias: bool = True
cat_encoding: str = 'int'
d_model: int = 128
embedding_bias: bool = False
embedding_projection: bool = True
embedding_type: str = 'linear'
ff_dropout: float = 0.1
frequencies_init_scale: float = 0.01
get_params(deep=True)

Get parameters for this estimator.

Parameters:

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

params – Parameter names mapped to their values.

Return type:

dict

head_dropout: float = 0.5
head_layer_sizes: list
head_skip_layers: bool = False
head_use_batch_norm: bool = False
layer_norm: bool = False
layer_norm_after_embedding: bool = False
layer_norm_eps: float = 1e-05
n_frequencies: int = 48
n_heads: int = 8
n_layers: int = 4
norm: str = 'LayerNorm'
norm_first: bool = False
plr_lite: bool = False
pooling_method: str = 'avg'
set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:

**params (dict) – Estimator parameters.

Returns:

self – Estimator instance.

Return type:

estimator instance

transformer_dim_feedforward: int = 256
use_cls: bool = False
use_embeddings: bool = False
class deeptab.configs.MambaTabConfig(use_embeddings=False, embedding_activation=Identity(), embedding_type='linear', embedding_bias=False, layer_norm_after_embedding=False, d_model=64, plr_lite=False, n_frequencies=48, frequencies_init_scale=0.01, embedding_projection=True, batch_norm=False, layer_norm=False, layer_norm_eps=1e-05, activation=ReLU(), cat_encoding='int', n_layers=1, expand_factor=2, bias=False, d_conv=16, conv_bias=True, dropout=0.05, dt_rank='auto', d_state=128, dt_scale=1.0, dt_init='random', dt_max=0.1, dt_min=0.0001, dt_init_floor=0.0001, axis=1, head_layer_sizes=<factory>, head_dropout=0.0, head_skip_layers=False, head_activation=ReLU(), head_use_batch_norm=False, norm='LayerNorm', use_pscan=False, mamba_version='mamba-torch', bidirectional=False)[source]

Architecture-only configuration for MambaTab models (DeepTab 2.0 API).

Parameters:
  • d_model (int) – Dimensionality of the model.

  • n_layers (int) – Number of layers in the model.

  • expand_factor (int) – Expansion factor for the feed-forward layers.

  • bias (bool) – Whether to use bias in the linear layers.

  • d_conv (int) – Dimensionality of the convolutional layers.

  • conv_bias (bool) – Whether to use bias in the convolutional layers.

  • dropout (float) – Dropout rate for regularization.

  • dt_rank (str) – Rank of the decision tree used in the model.

  • d_state (int) – Dimensionality of the state in recurrent layers.

  • dt_scale (float) – Scaling factor for the decision tree.

  • dt_init (str) – Initialization method for the decision tree.

  • dt_max (float) – Maximum value for decision tree initialization.

  • dt_min (float) – Minimum value for decision tree initialization.

  • dt_init_floor (float) – Floor value for decision tree initialization.

  • axis (int) – Axis along which operations are applied, if applicable.

  • head_layer_sizes (list) – Sizes of the fully connected layers in the model’s head.

  • head_dropout (float) – Dropout rate for the head layers.

  • head_skip_layers (bool) – Whether to skip layers in the head.

  • head_activation (Callable) – Activation function for the head layers.

  • head_use_batch_norm (bool) – Whether to use batch normalization in the head layers.

  • norm (str) – Type of normalization to be used (‘LayerNorm’, ‘RMSNorm’, etc.).

  • use_pscan (bool) – Whether to use PSCAN for the state-space model.

  • mamba_version (str) – Version of the Mamba model to use (‘mamba-torch’, ‘mamba1’, ‘mamba2’).

  • bidirectional (bool) – Whether to process data bidirectionally.

axis: int = 1
batch_norm: bool = False
bias: bool = False
bidirectional: bool = False
cat_encoding: str = 'int'
conv_bias: bool = True
d_conv: int = 16
d_model: int = 64
d_state: int = 128
dropout: float = 0.05
dt_init: str = 'random'
dt_init_floor: float = 0.0001
dt_max: float = 0.1
dt_min: float = 0.0001
dt_rank: str = 'auto'
dt_scale: float = 1.0
embedding_bias: bool = False
embedding_projection: bool = True
embedding_type: str = 'linear'
expand_factor: int = 2
frequencies_init_scale: float = 0.01
get_params(deep=True)

Get parameters for this estimator.

Parameters:

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

params – Parameter names mapped to their values.

Return type:

dict

head_dropout: float = 0.0
head_layer_sizes: list
head_skip_layers: bool = False
head_use_batch_norm: bool = False
layer_norm: bool = False
layer_norm_after_embedding: bool = False
layer_norm_eps: float = 1e-05
mamba_version: str = 'mamba-torch'
n_frequencies: int = 48
n_layers: int = 1
norm: str = 'LayerNorm'
plr_lite: bool = False
set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:

**params (dict) – Estimator parameters.

Returns:

self – Estimator instance.

Return type:

estimator instance

use_embeddings: bool = False
use_pscan: bool = False
class deeptab.configs.MambAttentionConfig(use_embeddings=False, embedding_activation=Identity(), embedding_type='linear', embedding_bias=False, layer_norm_after_embedding=False, d_model=64, plr_lite=False, n_frequencies=48, frequencies_init_scale=0.01, embedding_projection=True, batch_norm=False, layer_norm=False, layer_norm_eps=1e-05, activation=SiLU(), cat_encoding='int', n_layers=4, expand_factor=2, n_heads=8, last_layer='attn', n_mamba_per_attention=1, bias=False, d_conv=4, conv_bias=True, dropout=0.0, attn_dropout=0.2, dt_rank='auto', d_state=128, dt_scale=1.0, dt_init='random', dt_max=0.1, dt_min=0.0001, dt_init_floor=0.0001, norm='LayerNorm', AD_weight_decay=True, BC_layer_norm=False, shuffle_embeddings=False, head_layer_sizes=<factory>, head_dropout=0.5, head_skip_layers=False, head_activation=SELU(), head_use_batch_norm=False, pooling_method='avg', bidirectional=False, use_learnable_interaction=False, use_cls=False, use_pscan=False, n_attention_layers=1)[source]

Architecture-only configuration for MambAttention models (DeepTab 2.0 API).

Parameters:
  • d_model (int) – Dimensionality of the model.

  • activation (Callable) – Activation function for the model.

  • n_layers (int) – Number of layers in the model.

  • expand_factor (int) – Expansion factor for the feed-forward layers.

  • n_heads (int) – Number of attention heads in the model.

  • last_layer (str) – Type of the last layer (e.g., ‘attn’).

  • n_mamba_per_attention (int) – Number of Mamba blocks per attention layer.

  • bias (bool) – Whether to use bias in the linear layers.

  • d_conv (int) – Dimensionality of the convolutional layers.

  • conv_bias (bool) – Whether to use bias in the convolutional layers.

  • dropout (float) – Dropout rate for regularization.

  • attn_dropout (float) – Dropout rate for the attention mechanism.

  • dt_rank (str) – Rank of the decision tree.

  • d_state (int) – Dimensionality of the state in recurrent layers.

  • dt_scale (float) – Scaling factor for the decision tree.

  • dt_init (str) – Initialization method for the decision tree.

  • dt_max (float) – Maximum value for decision tree initialization.

  • dt_min (float) – Minimum value for decision tree initialization.

  • dt_init_floor (float) – Floor value for decision tree initialization.

  • norm (str) – Type of normalization used in the model.

  • AD_weight_decay (bool) – Whether weight decay is applied to A-D matrices.

  • BC_layer_norm (bool) – Whether to apply layer normalization to B-C matrices.

  • shuffle_embeddings (bool) – Whether to shuffle embeddings before passing to Mamba layers.

  • head_layer_sizes (list) – Sizes of the fully connected layers in the model’s head.

  • head_dropout (float) – Dropout rate for the head layers.

  • head_skip_layers (bool) – Whether to use skip connections in the head layers.

  • head_activation (Callable) – Activation function for the head layers.

  • head_use_batch_norm (bool) – Whether to use batch normalization in the head layers.

  • pooling_method (str) – Pooling method to be used (‘avg’, ‘max’, etc.).

  • bidirectional (bool) – Whether to process input sequences bidirectionally.

  • use_learnable_interaction (bool) – Whether to use learnable feature interactions before passing through Mamba blocks.

  • use_cls (bool) – Whether to append a CLS token for sequence pooling.

  • use_pscan (bool) – Whether to use PSCAN for the state-space model.

  • n_attention_layers (int) – Number of attention layers in the model.

AD_weight_decay: bool = True
BC_layer_norm: bool = False
attn_dropout: float = 0.2
batch_norm: bool = False
bias: bool = False
bidirectional: bool = False
cat_encoding: str = 'int'
conv_bias: bool = True
d_conv: int = 4
d_model: int = 64
d_state: int = 128
dropout: float = 0.0
dt_init: str = 'random'
dt_init_floor: float = 0.0001
dt_max: float = 0.1
dt_min: float = 0.0001
dt_rank: str = 'auto'
dt_scale: float = 1.0
embedding_bias: bool = False
embedding_projection: bool = True
embedding_type: str = 'linear'
expand_factor: int = 2
frequencies_init_scale: float = 0.01
get_params(deep=True)

Get parameters for this estimator.

Parameters:

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

params – Parameter names mapped to their values.

Return type:

dict

head_dropout: float = 0.5
head_layer_sizes: list
head_skip_layers: bool = False
head_use_batch_norm: bool = False
last_layer: str = 'attn'
layer_norm: bool = False
layer_norm_after_embedding: bool = False
layer_norm_eps: float = 1e-05
n_attention_layers: int = 1
n_frequencies: int = 48
n_heads: int = 8
n_layers: int = 4
n_mamba_per_attention: int = 1
norm: str = 'LayerNorm'
plr_lite: bool = False
pooling_method: str = 'avg'
set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:

**params (dict) – Estimator parameters.

Returns:

self – Estimator instance.

Return type:

estimator instance

shuffle_embeddings: bool = False
use_cls: bool = False
use_embeddings: bool = False
use_learnable_interaction: bool = False
use_pscan: bool = False
class deeptab.configs.MambularConfig(use_embeddings=False, embedding_activation=Identity(), embedding_type='linear', embedding_bias=False, layer_norm_after_embedding=False, d_model=64, plr_lite=False, n_frequencies=48, frequencies_init_scale=0.01, embedding_projection=True, batch_norm=False, layer_norm=False, layer_norm_eps=1e-05, activation=SiLU(), cat_encoding='int', n_layers=4, d_conv=4, dilation=1, expand_factor=2, bias=False, dropout=0.0, dt_rank='auto', d_state=128, dt_scale=1.0, dt_init='random', dt_max=0.1, dt_min=0.0001, dt_init_floor=0.0001, norm='RMSNorm', conv_bias=False, AD_weight_decay=True, BC_layer_norm=False, shuffle_embeddings=False, head_layer_sizes=<factory>, head_dropout=0.5, head_skip_layers=False, head_activation=SELU(), head_use_batch_norm=False, pooling_method='avg', bidirectional=False, use_learnable_interaction=False, use_cls=False, use_pscan=False, mamba_version='mamba-torch')[source]

Architecture-only configuration for Mambular models (DeepTab 2.0 API).

Parameters:
  • d_model (int) – Dimensionality of the model.

  • activation (Callable) – Activation function for the model.

  • n_layers (int) – Number of layers in the model.

  • d_conv (int) – Size of convolution over columns.

  • dilation (int) – Dilation factor for the convolution.

  • expand_factor (int) – Expansion factor for the feed-forward layers.

  • bias (bool) – Whether to use bias in the linear layers.

  • dropout (float) – Dropout rate for regularization.

  • dt_rank (str) – Rank of the decision tree used in the model.

  • d_state (int) – Dimensionality of the state in recurrent layers.

  • dt_scale (float) – Scaling factor for decision tree parameters.

  • dt_init (str) – Initialization method for decision tree parameters.

  • dt_max (float) – Maximum value for decision tree initialization.

  • dt_min (float) – Minimum value for decision tree initialization.

  • dt_init_floor (float) – Floor value for decision tree initialization.

  • norm (str) – Type of normalization used (‘RMSNorm’, etc.).

  • conv_bias (bool) – Whether to use a bias in the 1D convolution before each mamba block

  • AD_weight_decay (bool) – Whether to use weight decay als for the A and D matrices in Mamba

  • BC_layer_norm (bool) – Whether to use layer norm on the B and C matrices

  • shuffle_embeddings (bool) – Whether to shuffle embeddings before being passed to Mamba layers.

  • head_layer_sizes (list) – Sizes of the layers in the model’s head.

  • head_dropout (float) – Dropout rate for the head layers.

  • head_skip_layers (bool) – Whether to skip layers in the head.

  • head_activation (Callable) – Activation function for the head layers.

  • head_use_batch_norm (bool) – Whether to use batch normalization in the head layers.

  • pooling_method (str) – Pooling method to use (‘avg’, ‘max’, etc.).

  • bidirectional (bool) – Whether to process data bidirectionally.

  • use_learnable_interaction (bool) – Whether to use learnable feature interactions before passing through Mamba blocks.

  • use_cls (bool) – Whether to append a CLS token to the input sequences.

  • use_pscan (bool) – Whether to use PSCAN for the state-space model.

  • mamba_version (str) – Version of the Mamba model to use (‘mamba-torch’, ‘mamba1’, ‘mamba2’).

AD_weight_decay: bool = True
BC_layer_norm: bool = False
batch_norm: bool = False
bias: bool = False
bidirectional: bool = False
cat_encoding: str = 'int'
conv_bias: bool = False
d_conv: int = 4
d_model: int = 64
d_state: int = 128
dilation: int = 1
dropout: float = 0.0
dt_init: str = 'random'
dt_init_floor: float = 0.0001
dt_max: float = 0.1
dt_min: float = 0.0001
dt_rank: str = 'auto'
dt_scale: float = 1.0
embedding_bias: bool = False
embedding_projection: bool = True
embedding_type: str = 'linear'
expand_factor: int = 2
frequencies_init_scale: float = 0.01
get_params(deep=True)

Get parameters for this estimator.

Parameters:

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

params – Parameter names mapped to their values.

Return type:

dict

head_dropout: float = 0.5
head_layer_sizes: list
head_skip_layers: bool = False
head_use_batch_norm: bool = False
layer_norm: bool = False
layer_norm_after_embedding: bool = False
layer_norm_eps: float = 1e-05
mamba_version: str = 'mamba-torch'
n_frequencies: int = 48
n_layers: int = 4
norm: str = 'RMSNorm'
plr_lite: bool = False
pooling_method: str = 'avg'
set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:

**params (dict) – Estimator parameters.

Returns:

self – Estimator instance.

Return type:

estimator instance

shuffle_embeddings: bool = False
use_cls: bool = False
use_embeddings: bool = False
use_learnable_interaction: bool = False
use_pscan: bool = False
class deeptab.configs.MLPConfig(use_embeddings=False, embedding_activation=Identity(), embedding_type='linear', embedding_bias=False, layer_norm_after_embedding=False, d_model=32, plr_lite=False, n_frequencies=48, frequencies_init_scale=0.01, embedding_projection=True, batch_norm=False, layer_norm=False, layer_norm_eps=1e-05, activation=ReLU(), cat_encoding='int', layer_sizes=<factory>, dropout=0.2, use_glu=False, skip_connections=False)[source]

Architecture-only configuration for MLP models (DeepTab 2.0 API).

Contains only structural hyperparameters. Training parameters (lr, max_epochs, …) go in TrainerConfig and preprocessing parameters go in PreprocessingConfig.

Parameters:
  • layer_sizes (list) – Number of units in each hidden layer.

  • activation (Callable) – Activation function for the MLP layers.

  • skip_layers (bool, default=False) – Whether to include skip layers.

  • dropout (float) – Dropout rate applied after each hidden layer.

  • use_glu (bool) – Whether to use Gated Linear Units instead of the plain activation.

  • skip_connections (bool) – Whether to use residual/skip connections between layers.

batch_norm: bool = False
cat_encoding: str = 'int'
d_model: int = 32
dropout: float = 0.2
embedding_bias: bool = False
embedding_projection: bool = True
embedding_type: str = 'linear'
frequencies_init_scale: float = 0.01
get_params(deep=True)

Get parameters for this estimator.

Parameters:

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

params – Parameter names mapped to their values.

Return type:

dict

layer_norm: bool = False
layer_norm_after_embedding: bool = False
layer_norm_eps: float = 1e-05
layer_sizes: list
n_frequencies: int = 48
plr_lite: bool = False
set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:

**params (dict) – Estimator parameters.

Returns:

self – Estimator instance.

Return type:

estimator instance

skip_connections: bool = False
use_embeddings: bool = False
use_glu: bool = False
class deeptab.configs.NDTFConfig(use_embeddings=False, embedding_activation=Identity(), embedding_type='linear', embedding_bias=False, layer_norm_after_embedding=False, d_model=32, plr_lite=False, n_frequencies=48, frequencies_init_scale=0.01, embedding_projection=True, batch_norm=False, layer_norm=False, layer_norm_eps=1e-05, activation=ReLU(), cat_encoding='int', min_depth=4, max_depth=16, temperature=0.1, node_sampling=0.3, lamda=0.3, n_ensembles=12, penalty_factor=1e-08)[source]

Architecture-only configuration for NDTF models (DeepTab 2.0 API).

Parameters:
  • min_depth (int) – Minimum depth of trees in the forest. Controls the simplest model structure.

  • max_depth (int) – Maximum depth of trees in the forest. Controls the maximum complexity of the trees.

  • temperature (float) – Temperature parameter for softening the node decisions during path probability calculation.

  • node_sampling (float) – Fraction of nodes sampled for regularization penalty calculation. Reduces computation by focusing on a subset of nodes.

  • lamda (float) – Regularization parameter to control the complexity of the paths, penalizing overconfident or imbalanced paths.

  • n_ensembles (int) – Number of trees in the forest

  • penalty_factor (float) – Factor with which the penalty is multiplied

batch_norm: bool = False
cat_encoding: str = 'int'
d_model: int = 32
embedding_bias: bool = False
embedding_projection: bool = True
embedding_type: str = 'linear'
frequencies_init_scale: float = 0.01
get_params(deep=True)

Get parameters for this estimator.

Parameters:

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

params – Parameter names mapped to their values.

Return type:

dict

lamda: float = 0.3
layer_norm: bool = False
layer_norm_after_embedding: bool = False
layer_norm_eps: float = 1e-05
max_depth: int = 16
min_depth: int = 4
n_ensembles: int = 12
n_frequencies: int = 48
node_sampling: float = 0.3
penalty_factor: float = 1e-08
plr_lite: bool = False
set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:

**params (dict) – Estimator parameters.

Returns:

self – Estimator instance.

Return type:

estimator instance

temperature: float = 0.1
use_embeddings: bool = False
class deeptab.configs.NODEConfig(use_embeddings=False, embedding_activation=Identity(), embedding_type='linear', embedding_bias=False, layer_norm_after_embedding=False, d_model=32, plr_lite=False, n_frequencies=48, frequencies_init_scale=0.01, embedding_projection=True, batch_norm=False, layer_norm=False, layer_norm_eps=1e-05, activation=ReLU(), cat_encoding='int', num_layers=4, layer_dim=128, tree_dim=1, depth=6, norm=None, head_layer_sizes=<factory>, head_dropout=0.3, head_skip_layers=False, head_activation=ReLU(), head_use_batch_norm=False)[source]

Architecture-only configuration for NODE models (DeepTab 2.0 API).

Parameters:
  • num_layers (int) – Number of dense layers in the model.

  • layer_dim (int) – Dimensionality of each dense layer.

  • tree_dim (int) – Dimensionality of the output from each tree leaf.

  • depth (int) – Depth of each decision tree in the ensemble.

  • norm (str | None) – Type of normalization to use in the model.

  • head_layer_sizes (list) – Sizes of the layers in the model’s head.

  • head_dropout (float) – Dropout rate for the head layers.

  • head_skip_layers (bool) – Whether to skip layers in the head.

  • head_activation (Callable) – Activation function for the head layers.

  • head_use_batch_norm (bool) – Whether to use batch normalization in the head layers.

batch_norm: bool = False
cat_encoding: str = 'int'
d_model: int = 32
depth: int = 6
embedding_bias: bool = False
embedding_projection: bool = True
embedding_type: str = 'linear'
frequencies_init_scale: float = 0.01
get_params(deep=True)

Get parameters for this estimator.

Parameters:

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

params – Parameter names mapped to their values.

Return type:

dict

head_dropout: float = 0.3
head_layer_sizes: list
head_skip_layers: bool = False
head_use_batch_norm: bool = False
layer_dim: int = 128
layer_norm: bool = False
layer_norm_after_embedding: bool = False
layer_norm_eps: float = 1e-05
n_frequencies: int = 48
norm: str | None = None
num_layers: int = 4
plr_lite: bool = False
set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:

**params (dict) – Estimator parameters.

Returns:

self – Estimator instance.

Return type:

estimator instance

tree_dim: int = 1
use_embeddings: bool = False
class deeptab.configs.ResNetConfig(use_embeddings=False, embedding_activation=Identity(), embedding_type='linear', embedding_bias=False, layer_norm_after_embedding=False, d_model=32, plr_lite=False, n_frequencies=48, frequencies_init_scale=0.01, embedding_projection=True, batch_norm=False, layer_norm=False, layer_norm_eps=1e-05, activation=SELU(), cat_encoding='int', layer_sizes=<factory>, dropout=0.5, norm=False, num_blocks=3)[source]

Architecture-only configuration for ResNet models (DeepTab 2.0 API).

Parameters:
  • activation (Callable) – Activation function for the ResNet layers.

  • layer_sizes (list) – Sizes of the layers in the ResNet.

  • dropout (float) – Dropout rate for regularization.

  • norm (bool) – Whether to use normalization in the ResNet.

  • num_blocks (int) – Number of residual blocks in the ResNet.

batch_norm: bool = False
cat_encoding: str = 'int'
d_model: int = 32
dropout: float = 0.5
embedding_bias: bool = False
embedding_projection: bool = True
embedding_type: str = 'linear'
frequencies_init_scale: float = 0.01
get_params(deep=True)

Get parameters for this estimator.

Parameters:

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

params – Parameter names mapped to their values.

Return type:

dict

layer_norm: bool = False
layer_norm_after_embedding: bool = False
layer_norm_eps: float = 1e-05
layer_sizes: list
n_frequencies: int = 48
norm: bool = False
num_blocks: int = 3
plr_lite: bool = False
set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:

**params (dict) – Estimator parameters.

Returns:

self – Estimator instance.

Return type:

estimator instance

use_embeddings: bool = False
class deeptab.configs.SAINTConfig(use_embeddings=False, embedding_activation=Identity(), embedding_type='linear', embedding_bias=False, layer_norm_after_embedding=False, d_model=128, plr_lite=False, n_frequencies=48, frequencies_init_scale=0.01, embedding_projection=True, batch_norm=False, layer_norm=False, layer_norm_eps=1e-05, activation=GELU(approximate='none'), cat_encoding='int', n_layers=1, n_heads=2, attn_dropout=0.2, ff_dropout=0.1, norm='LayerNorm', norm_first=False, bias=True, head_layer_sizes=<factory>, head_dropout=0.5, head_skip_layers=False, head_activation=SELU(), head_use_batch_norm=False, pooling_method='cls', use_cls=True)[source]

Architecture-only configuration for SAINT models (DeepTab 2.0 API).

Parameters:
  • d_model (int) – Dimensionality of embeddings or model representations.

  • activation (Callable) – Activation function for the transformer layers.

  • n_layers (int) – Number of transformer layers.

  • n_heads (int) – Number of attention heads in the transformer.

  • attn_dropout (float) – Dropout rate for the attention mechanism.

  • ff_dropout (float) – Dropout rate for the feed-forward layers.

  • norm (str) – Type of normalization to be used (‘LayerNorm’, ‘RMSNorm’, etc.).

  • norm_first (bool) – Whether to apply normalization before other operations in each transformer block.

  • bias (bool) – Whether to use bias in linear layers.

  • head_layer_sizes (list) – Sizes of the fully connected layers in the model’s head.

  • head_dropout (float) – Dropout rate for the head layers.

  • head_skip_layers (bool) – Whether to use skip connections in the head layers.

  • head_activation (Callable) – Activation function for the head layers.

  • head_use_batch_norm (bool) – Whether to use batch normalization in the head layers.

  • pooling_method (str) – Pooling method to be used (‘cls’, ‘avg’, etc.).

  • use_cls (bool) – Whether to use a CLS token for pooling.

attn_dropout: float = 0.2
batch_norm: bool = False
bias: bool = True
cat_encoding: str = 'int'
d_model: int = 128
embedding_bias: bool = False
embedding_projection: bool = True
embedding_type: str = 'linear'
ff_dropout: float = 0.1
frequencies_init_scale: float = 0.01
get_params(deep=True)

Get parameters for this estimator.

Parameters:

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

params – Parameter names mapped to their values.

Return type:

dict

head_dropout: float = 0.5
head_layer_sizes: list
head_skip_layers: bool = False
head_use_batch_norm: bool = False
layer_norm: bool = False
layer_norm_after_embedding: bool = False
layer_norm_eps: float = 1e-05
n_frequencies: int = 48
n_heads: int = 2
n_layers: int = 1
norm: str = 'LayerNorm'
norm_first: bool = False
plr_lite: bool = False
pooling_method: str = 'cls'
set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:

**params (dict) – Estimator parameters.

Returns:

self – Estimator instance.

Return type:

estimator instance

use_cls: bool = True
use_embeddings: bool = False
class deeptab.configs.TabMConfig(use_embeddings=False, embedding_activation=Identity(), embedding_type='linear', embedding_bias=False, layer_norm_after_embedding=False, d_model=32, plr_lite=False, n_frequencies=48, frequencies_init_scale=0.01, embedding_projection=True, batch_norm=False, layer_norm=False, layer_norm_eps=1e-05, activation=ReLU(), cat_encoding='int', layer_sizes=<factory>, dropout=0.5, norm=None, use_glu=False, ensemble_size=32, ensemble_scaling_in=True, ensemble_scaling_out=True, ensemble_bias=True, scaling_init='ones', average_ensembles=False, model_type='mini', average_embeddings=True)[source]

Architecture-only configuration for TabM models (DeepTab 2.0 API).

Parameters:
  • layer_sizes (list) – Sizes of the layers in the model.

  • dropout (float) – Dropout rate for regularization.

  • norm (str | None) – Normalization method to be used, if any.

  • use_glu (bool) – Whether to use Gated Linear Units (GLU) in the model.

  • ensemble_size (int) – Number of ensemble members for batch ensembling.

  • ensemble_scaling_in (bool) – Whether to use input scaling for each ensemble member.

  • ensemble_scaling_out (bool) – Whether to use output scaling for each ensemble member.

  • ensemble_bias (bool) – Whether to use a unique bias term for each ensemble member.

  • scaling_init (Literal['ones', 'random-signs', 'normal']) – Initialization method for scaling weights.

  • average_ensembles (bool) – Whether to average the outputs of the ensembles.

  • model_type (Literal['mini', 'full']) – Model type to use (‘mini’ for reduced version, ‘full’ for complete model).

  • average_embeddings (bool) – Whether to average per-ensemble-member embeddings before the head.

average_embeddings: bool = True
average_ensembles: bool = False
batch_norm: bool = False
cat_encoding: str = 'int'
d_model: int = 32
dropout: float = 0.5
embedding_bias: bool = False
embedding_projection: bool = True
embedding_type: str = 'linear'
ensemble_bias: bool = True
ensemble_scaling_in: bool = True
ensemble_scaling_out: bool = True
ensemble_size: int = 32
frequencies_init_scale: float = 0.01
get_params(deep=True)

Get parameters for this estimator.

Parameters:

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

params – Parameter names mapped to their values.

Return type:

dict

layer_norm: bool = False
layer_norm_after_embedding: bool = False
layer_norm_eps: float = 1e-05
layer_sizes: list
model_type: Literal['mini', 'full'] = 'mini'
n_frequencies: int = 48
norm: str | None = None
plr_lite: bool = False
scaling_init: Literal['ones', 'random-signs', 'normal'] = 'ones'
set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:

**params (dict) – Estimator parameters.

Returns:

self – Estimator instance.

Return type:

estimator instance

use_embeddings: bool = False
use_glu: bool = False
class deeptab.configs.TabRConfig(use_embeddings=False, embedding_activation=Identity(), embedding_type='plr', embedding_bias=False, layer_norm_after_embedding=False, d_model=32, plr_lite=True, n_frequencies=75, frequencies_init_scale=0.045, embedding_projection=True, batch_norm=False, layer_norm=False, layer_norm_eps=1e-05, activation=ReLU(), cat_encoding='int', d_main=256, context_dropout=0.38920071545944357, d_multiplier=2, encoder_n_blocks=0, predictor_n_blocks=1, mixer_normalization='auto', dropout0=0.38852797479169876, dropout1=0.0, normalization='LayerNorm', memory_efficient=False, candidate_encoding_batch_size=0, context_size=96)[source]

Architecture-only configuration for TabR models (DeepTab 2.0 API).

Training fields (lr, weight_decay, lr_factor) are configured via TrainerConfig.

Parameters:
  • embedding_type (str) – Type of feature embedding to use (e.g., ‘plr’, ‘ple’).

  • plr_lite (bool) – Whether to use the lightweight PLR embedding variant.

  • n_frequencies (int) – Number of random Fourier feature frequencies.

  • frequencies_init_scale (float) – Scale for initializing Fourier feature frequencies.

  • d_main (int) – Main hidden dimensionality of the predictor network.

  • context_dropout (float) – Dropout applied to context (candidate) representations.

  • d_multiplier (int) – Multiplier for intermediate dimensions inside the predictor.

  • encoder_n_blocks (int) – Number of residual blocks in the feature encoder.

  • predictor_n_blocks (int) – Number of residual blocks in the predictor network.

  • mixer_normalization (str) – Normalization strategy for the mixer ('auto' selects adaptively).

  • dropout0 (float) – Dropout rate on the first linear projection.

  • dropout1 (float) – Dropout rate on the second linear projection.

  • normalization (str) – Type of normalization layer to use.

  • memory_efficient (bool) – Whether to trade compute for lower memory in candidate lookups.

  • candidate_encoding_batch_size (int) – Batch size for encoding candidates (0 = full batch).

  • context_size (int) – Number of nearest-neighbour candidates to retrieve per sample.

batch_norm: bool = False
candidate_encoding_batch_size: int = 0
cat_encoding: str = 'int'
context_dropout: float = 0.38920071545944357
context_size: int = 96
d_main: int = 256
d_model: int = 32
d_multiplier: int = 2
dropout0: float = 0.38852797479169876
dropout1: float = 0.0
embedding_bias: bool = False
embedding_projection: bool = True
embedding_type: str = 'plr'
encoder_n_blocks: int = 0
frequencies_init_scale: float = 0.045
get_params(deep=True)

Get parameters for this estimator.

Parameters:

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

params – Parameter names mapped to their values.

Return type:

dict

layer_norm: bool = False
layer_norm_after_embedding: bool = False
layer_norm_eps: float = 1e-05
memory_efficient: bool = False
mixer_normalization: str = 'auto'
n_frequencies: int = 75
normalization: str = 'LayerNorm'
plr_lite: bool = True
predictor_n_blocks: int = 1
set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:

**params (dict) – Estimator parameters.

Returns:

self – Estimator instance.

Return type:

estimator instance

use_embeddings: bool = False
class deeptab.configs.TabTransformerConfig(use_embeddings=False, embedding_activation=Identity(), embedding_type='linear', embedding_bias=False, layer_norm_after_embedding=False, d_model=128, plr_lite=False, n_frequencies=48, frequencies_init_scale=0.01, embedding_projection=True, batch_norm=False, layer_norm=False, layer_norm_eps=1e-05, activation=SELU(), cat_encoding='int', n_layers=4, n_heads=8, attn_dropout=0.2, ff_dropout=0.1, norm='LayerNorm', transformer_activation=ReGLU(), transformer_dim_feedforward=512, norm_first=True, bias=True, head_layer_sizes=<factory>, head_dropout=0.5, head_skip_layers=False, head_activation=SELU(), head_use_batch_norm=False, pooling_method='avg')[source]

Architecture-only configuration for TabTransformer models (DeepTab 2.0 API).

Parameters:
  • d_model (int) – Dimensionality of embeddings or model representations.

  • activation (Callable) – Activation function for the transformer layers.

  • n_layers (int) – Number of layers in the transformer.

  • n_heads (int) – Number of attention heads in the transformer.

  • attn_dropout (float) – Dropout rate for the attention mechanism.

  • ff_dropout (float) – Dropout rate for the feed-forward layers.

  • norm (str) – Normalization method to be used.

  • transformer_activation (Callable) – Activation function for the transformer layers.

  • transformer_dim_feedforward (int) – Dimensionality of the feed-forward layers in the transformer.

  • norm_first (bool) – Whether to apply normalization before other operations in each transformer block.

  • bias (bool) – Whether to use bias in the linear layers.

  • head_layer_sizes (list) – Sizes of the layers in the model’s head.

  • head_dropout (float) – Dropout rate for the head layers.

  • head_skip_layers (bool) – Whether to skip layers in the head.

  • head_activation (Callable) – Activation function for the head layers.

  • head_use_batch_norm (bool) – Whether to use batch normalization in the head layers.

  • pooling_method (str) – Pooling method to be used (‘cls’, ‘avg’, etc.).

attn_dropout: float = 0.2
batch_norm: bool = False
bias: bool = True
cat_encoding: str = 'int'
d_model: int = 128
embedding_bias: bool = False
embedding_projection: bool = True
embedding_type: str = 'linear'
ff_dropout: float = 0.1
frequencies_init_scale: float = 0.01
get_params(deep=True)

Get parameters for this estimator.

Parameters:

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

params – Parameter names mapped to their values.

Return type:

dict

head_dropout: float = 0.5
head_layer_sizes: list
head_skip_layers: bool = False
head_use_batch_norm: bool = False
layer_norm: bool = False
layer_norm_after_embedding: bool = False
layer_norm_eps: float = 1e-05
n_frequencies: int = 48
n_heads: int = 8
n_layers: int = 4
norm: str = 'LayerNorm'
norm_first: bool = True
plr_lite: bool = False
pooling_method: str = 'avg'
set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:

**params (dict) – Estimator parameters.

Returns:

self – Estimator instance.

Return type:

estimator instance

transformer_dim_feedforward: int = 512
use_embeddings: bool = False
class deeptab.configs.TabulaRNNConfig(use_embeddings=False, embedding_activation=Identity(), embedding_type='linear', embedding_bias=False, layer_norm_after_embedding=False, d_model=128, plr_lite=False, n_frequencies=48, frequencies_init_scale=0.01, embedding_projection=True, batch_norm=False, layer_norm=False, layer_norm_eps=1e-05, activation=SELU(), cat_encoding='int', model_type='RNN', n_layers=4, rnn_dropout=0.2, norm='RMSNorm', residuals=False, norm_first=False, bias=True, rnn_activation='relu', dim_feedforward=256, d_conv=4, dilation=1, conv_bias=True, head_layer_sizes=<factory>, head_dropout=0.5, head_skip_layers=False, head_activation=SELU(), head_use_batch_norm=False, pooling_method='avg')[source]

Architecture-only configuration for TabulaRNN models (DeepTab 2.0 API).

Parameters:
  • d_model (int) – Dimensionality of embeddings or model representations.

  • activation (Callable) – Activation function for the RNN layers.

  • model_type (str) – Type of model, one of “RNN”, “LSTM”, “GRU”, “mLSTM”, “sLSTM”.

  • n_layers (int) – Number of layers in the RNN.

  • rnn_dropout (float) – Dropout rate for the RNN layers.

  • norm (str) – Normalization method to be used.

  • residuals (bool) – Whether to include residual connections in the RNN.

  • norm_first (bool) – Whether to apply normalization before other operations in each block.

  • bias (bool) – Whether to use bias in the linear layers.

  • rnn_activation (str) – Activation function for the RNN layers.

  • dim_feedforward (int) – Size of the feedforward network.

  • d_conv (int) – Size of the convolutional layer for embedding features.

  • dilation (int) – Dilation factor for the convolution.

  • conv_bias (bool) – Whether to use bias in the convolutional layers.

  • head_layer_sizes (list) – Sizes of the layers in the head of the model.

  • head_dropout (float) – Dropout rate for the head layers.

  • head_skip_layers (bool) – Whether to skip layers in the head.

  • head_activation (Callable) – Activation function for the head layers.

  • head_use_batch_norm (bool) – Whether to use batch normalization in the head layers.

  • pooling_method (str) – Pooling method to be used (‘avg’, ‘cls’, etc.).

batch_norm: bool = False
bias: bool = True
cat_encoding: str = 'int'
conv_bias: bool = True
d_conv: int = 4
d_model: int = 128
dilation: int = 1
dim_feedforward: int = 256
embedding_bias: bool = False
embedding_projection: bool = True
embedding_type: str = 'linear'
frequencies_init_scale: float = 0.01
get_params(deep=True)

Get parameters for this estimator.

Parameters:

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

params – Parameter names mapped to their values.

Return type:

dict

head_dropout: float = 0.5
head_layer_sizes: list
head_skip_layers: bool = False
head_use_batch_norm: bool = False
layer_norm: bool = False
layer_norm_after_embedding: bool = False
layer_norm_eps: float = 1e-05
model_type: str = 'RNN'
n_frequencies: int = 48
n_layers: int = 4
norm: str = 'RMSNorm'
norm_first: bool = False
plr_lite: bool = False
pooling_method: str = 'avg'
residuals: bool = False
rnn_activation: str = 'relu'
rnn_dropout: float = 0.2
set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:

**params (dict) – Estimator parameters.

Returns:

self – Estimator instance.

Return type:

estimator instance

use_embeddings: bool = False

Experimental model configs

class deeptab.configs.ModernNCAConfig(use_embeddings=False, embedding_activation=Identity(), embedding_type='plr', embedding_bias=False, layer_norm_after_embedding=False, d_model=32, plr_lite=True, n_frequencies=75, frequencies_init_scale=0.045, embedding_projection=True, batch_norm=False, layer_norm=False, layer_norm_eps=1e-05, activation=ReLU(), cat_encoding='int', dim=128, d_block=512, n_blocks=4, dropout=0.1, temperature=0.75, sample_rate=0.5, num_embeddings=None, head_layer_sizes=<factory>, head_dropout=0.5, head_skip_layers=False, head_activation=SELU(), head_use_batch_norm=False)[source]

Architecture-only configuration for ModernNCA models (DeepTab 2.0 API).

Parameters:
  • embedding_type (str) – Type of feature embedding to use (e.g., ‘plr’, ‘ple’).

  • plr_lite (bool) – Whether to use the lightweight PLR embedding variant.

  • n_frequencies (int) – Number of random Fourier feature frequencies.

  • frequencies_init_scale (float) – Scale for initializing Fourier feature frequencies.

  • dim (int) – Embedding dimensionality per feature.

  • d_block (int) – Hidden size of each residual block.

  • n_blocks (int) – Number of residual blocks.

  • dropout (float) – Dropout rate applied inside each block.

  • temperature (float) – Temperature scaling for NCA softmax similarity.

  • sample_rate (float) – Fraction of training candidates used per forward pass.

  • num_embeddings (dict | None) – Optional dict mapping feature indices to embedding sizes.

  • head_layer_sizes (list) – Sizes of the fully connected layers in the prediction head.

  • head_dropout (float) – Dropout rate for the head layers.

  • head_skip_layers (bool) – Whether to use skip connections in the head layers.

  • head_activation (Callable) – Activation function for the head layers.

  • head_use_batch_norm (bool) – Whether to use batch normalization in the head layers.

batch_norm: bool = False
cat_encoding: str = 'int'
d_block: int = 512
d_model: int = 32
dim: int = 128
dropout: float = 0.1
embedding_bias: bool = False
embedding_projection: bool = True
embedding_type: str = 'plr'
frequencies_init_scale: float = 0.045
get_params(deep=True)

Get parameters for this estimator.

Parameters:

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

params – Parameter names mapped to their values.

Return type:

dict

head_dropout: float = 0.5
head_layer_sizes: list
head_skip_layers: bool = False
head_use_batch_norm: bool = False
layer_norm: bool = False
layer_norm_after_embedding: bool = False
layer_norm_eps: float = 1e-05
n_blocks: int = 4
n_frequencies: int = 75
num_embeddings: dict | None = None
plr_lite: bool = True
sample_rate: float = 0.5
set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:

**params (dict) – Estimator parameters.

Returns:

self – Estimator instance.

Return type:

estimator instance

temperature: float = 0.75
use_embeddings: bool = False
class deeptab.configs.TangosConfig(use_embeddings=False, embedding_activation=Identity(), embedding_type='linear', embedding_bias=False, layer_norm_after_embedding=False, d_model=32, plr_lite=False, n_frequencies=48, frequencies_init_scale=0.01, embedding_projection=True, batch_norm=False, layer_norm=False, layer_norm_eps=1e-05, activation=ReLU(), cat_encoding='int', layer_sizes=<factory>, skip_layers=False, dropout=0.2, use_glu=False, skip_connections=False, lamda1=0.5, lamda2=0.1, subsample=0.5)[source]

Architecture-only configuration for Tangos models (DeepTab 2.0 API).

Parameters:
  • activation (Callable) – Activation function for the TANGOS layers.

  • layer_sizes (list) – Sizes of the layers in the TANGOS.

  • skip_layers (bool) – Whether to skip layers in the TANGOS.

  • dropout (float) – Dropout rate for regularization.

  • use_glu (bool) – Whether to use Gated Linear Units (GLU) in the TANGOS.

  • skip_connections (bool) – Whether to use skip connections in the TANGOS.

  • lamda1 (float) – Weight on the task-specific orthogonality regularisation term.

  • lamda2 (float) – Weight on the cross-task specialisation regularisation term.

  • subsample (float) – Fraction of features subsampled for regularisation estimation.

batch_norm: bool = False
cat_encoding: str = 'int'
d_model: int = 32
dropout: float = 0.2
embedding_bias: bool = False
embedding_projection: bool = True
embedding_type: str = 'linear'
frequencies_init_scale: float = 0.01
get_params(deep=True)

Get parameters for this estimator.

Parameters:

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

params – Parameter names mapped to their values.

Return type:

dict

lamda1: float = 0.5
lamda2: float = 0.1
layer_norm: bool = False
layer_norm_after_embedding: bool = False
layer_norm_eps: float = 1e-05
layer_sizes: list
n_frequencies: int = 48
plr_lite: bool = False
set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:

**params (dict) – Estimator parameters.

Returns:

self – Estimator instance.

Return type:

estimator instance

skip_connections: bool = False
skip_layers: bool = False
subsample: float = 0.5
use_embeddings: bool = False
use_glu: bool = False
class deeptab.configs.TromptConfig(use_embeddings=False, embedding_activation=Identity(), embedding_type='linear', embedding_bias=False, layer_norm_after_embedding=False, d_model=128, plr_lite=False, n_frequencies=48, frequencies_init_scale=0.01, embedding_projection=True, batch_norm=False, layer_norm=False, layer_norm_eps=1e-05, activation=ReLU(), cat_encoding='int', n_cycles=6, n_cells=4, P=128)[source]

Architecture-only configuration for Trompt models (DeepTab 2.0 API).

Parameters:
  • d_model (int) – Dimensionality of the transformer model.

  • n_cycles (int) – Number of cycles in the Trompt model.

  • n_cells (int) – Number of cells in each cycle.

  • P (int) – Number of steps in the Trompt model.

P: int = 128
batch_norm: bool = False
cat_encoding: str = 'int'
d_model: int = 128
embedding_bias: bool = False
embedding_projection: bool = True
embedding_type: str = 'linear'
frequencies_init_scale: float = 0.01
get_params(deep=True)

Get parameters for this estimator.

Parameters:

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

params – Parameter names mapped to their values.

Return type:

dict

layer_norm: bool = False
layer_norm_after_embedding: bool = False
layer_norm_eps: float = 1e-05
n_cells: int = 4
n_cycles: int = 6
n_frequencies: int = 48
plr_lite: bool = False
set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:

**params (dict) – Estimator parameters.

Returns:

self – Estimator instance.

Return type:

estimator instance

use_embeddings: bool = False