Config System
DeepTab uses a split-config API. Architecture, preprocessing, and training settings are kept in separate dataclasses so experiments can change one layer without mixing concerns.
Important
The model constructor accepts model_config, preprocessing_config, and trainer_config. Flat constructor arguments are legacy compatibility only; new documentation and experiments should use split configs.
The Three Config Layers
Config |
Scope |
Examples |
|---|---|---|
|
Neural architecture |
|
|
Arguments passed to |
|
|
Training loop and optimizer |
|
All three are optional. If omitted, DeepTab creates default config objects internally.
Where to find every field
Each config has a complete, authoritative field reference. Use the table below as the index.
Config |
Full field reference |
|---|---|
|
Shared fields in |
|
The Preprocessing Config table below |
|
The Trainer Config table below |
Tip
At runtime you can list the fields of any config without leaving Python: MambularConfig().get_params(deep=False) returns the field-to-value mapping, and the same call works on PreprocessingConfig and TrainerConfig.
Keeping each config in the right slot
Each config belongs to a specific constructor argument: a model config goes to model_config, a PreprocessingConfig to preprocessing_config, and a TrainerConfig to trainer_config. The estimator does not reorder them for you and does not guess intent from the object type.
If you pass a config to the wrong slot, DeepTab now detects it and emits a ConfigWarning that names the offending object and the slot it landed in:
from deeptab.configs import MambularConfig, PreprocessingConfig, TrainerConfig
from deeptab.models import MambularClassifier
# TrainerConfig accidentally passed where the model config belongs
MambularClassifier(model_config=TrainerConfig())
# ConfigWarning: TrainerConfig was passed as 'model_config', but 'model_config'
# expects a BaseModelConfig. Configs are not reordered for you, so this one will
# be misused or silently ignored. Pass it as its matching argument instead.
Warning
The check warns rather than raises, so construction still succeeds. A misplaced config is then misused or silently ignored: for example a wrong preprocessing_config falls back to default preprocessing, and a wrong trainer_config falls back to the default optimizer. Treat this warning as an error in your own code and fix the argument it points to.
Note
The warning only fires for a recognised DeepTab config sitting in the wrong slot. Genuinely custom or duck-typed objects (for example test doubles) are left untouched, so the check never gets in the way of advanced extension code.
Passing a field to the wrong config
A related mistake is putting the right kind of value on the wrong config, for example a model field such as d_model on a TrainerConfig, or a trainer field such as lr on a PreprocessingConfig. This case does not need a DeepTab warning because it already fails fast and clearly through the underlying machinery.
Each config is a dataclass, so an unknown field is rejected the moment you build it:
from deeptab.configs import TrainerConfig
TrainerConfig(d_model=64)
# TypeError: TrainerConfig.__init__() got an unexpected keyword argument 'd_model'
The same protection applies through set_params, where scikit-learn validates the nested field name:
model.set_params(trainer_config__d_model=64)
# ValueError: Invalid parameter 'd_model' for estimator TrainerConfig(...).
Note
The two mistakes fail in deliberately different ways. A whole config in the wrong slot is duck-typed and only triggers an advisory ConfigWarning, because a custom object might legitimately stand in for a config. A wrong field name has no such ambiguity, so it raises immediately. If you are unsure which config owns a field, check the field reference index above or call Config().get_params(deep=False) to list its valid fields.
Model Configs
Every architecture has a dedicated config class:
from deeptab.configs import MambularConfig
from deeptab.models import MambularClassifier
model = MambularClassifier(
model_config=MambularConfig(
d_model=64,
n_layers=4,
dropout=0.0,
pooling_method="avg",
)
)
Model configs inherit shared embedding and architecture fields from BaseModelConfig, including use_embeddings, embedding_type, d_model, batch_norm, layer_norm, activation, and cat_encoding. Individual models add their own fields; use the model-zoo pages or API reference for model-specific details.
Preprocessing Config
PreprocessingConfig is a thin wrapper around the supported pretab.Preprocessor keyword arguments. Fields set to None are omitted, leaving the preprocessor default in effect.
from deeptab.configs import PreprocessingConfig
preprocessing_config = PreprocessingConfig(
numerical_preprocessing="quantile",
categorical_preprocessing="int",
n_bins=50,
scaling_strategy="minmax",
)
Valid fields:
Field |
Purpose |
|---|---|
|
Main numerical transform, e.g. |
|
Categorical encoding strategy passed to |
|
Number of bins for binned/PLE-style numerical transforms. |
|
General feature-level preprocessing override. |
|
Controls bin edge construction. |
|
Optional task hint passed to the preprocessor. |
|
Controls integer-column type inference. |
|
Spline/piecewise preprocessing controls. |
|
Post-transform scaling: |
Embedding width is not a PreprocessingConfig field in the current API. It is controlled by model config fields such as d_model when an architecture uses EmbeddingLayer.
Running with no numerical preprocessing
Set numerical_preprocessing=None (and categorical_preprocessing=None) to skip the scaling and encoding transforms and feed near-raw values to the network.
prep = PreprocessingConfig(
numerical_preprocessing=None, # no scaling, binning, or PLE on numeric columns
categorical_preprocessing=None, # leave categorical encoding at its default
)
model = MambularClassifier(preprocessing_config=prep)
Important
None turns off the numerical transform, not the data layer. DeepTab still detects feature types, turns categorical columns into the integer indices the embedding layers expect, handles missing values, and assembles batched tensors. There is no setting that sends a raw, unconverted DataFrame straight into an nn.Module, because the model needs typed, numeric tensors to run.
Note
Most deep tabular models train better with a numerical transform than without one. None is useful when your features are already scaled, or when you want a clean baseline to measure a transform against. For skewed or heavy-tailed inputs, "quantile" or "ple" are usually stronger starting points.
Trainer Config
TrainerConfig controls fit-time defaults used by the estimator.
from deeptab.configs import TrainerConfig
trainer_config = TrainerConfig(
max_epochs=100,
batch_size=128,
val_size=0.2,
patience=15,
monitor="val_loss",
mode="min",
lr=1e-4,
lr_patience=10,
lr_factor=0.1,
weight_decay=1e-6,
optimizer_type="Adam",
checkpoint_path="model_checkpoints",
)
Valid fields:
Field |
Meaning |
|---|---|
|
Maximum Lightning training epochs. |
|
Batch size for train/validation/prediction loaders. |
|
Fraction held out when no explicit validation set is passed. |
|
Whether to shuffle the training dataloader. |
|
Whether to stratify the validation split on |
|
Early-stopping settings. |
|
Learning rate and |
|
Optimizer weight decay (L2 penalty). |
|
Case-insensitive name of a registered optimizer (e.g. |
|
Extra kwargs forwarded to the optimizer constructor (e.g. |
|
Case-insensitive name of a registered LR scheduler, or |
|
Extra kwargs forwarded to the scheduler constructor. For |
|
Override the metric watched by the scheduler (defaults to |
|
|
|
How many intervals to wait between scheduler steps (default |
|
When |
|
Directory for the best-model checkpoint. |
Runtime options such as accelerator, devices, precision, gradient_clip_val, and logger/callback settings are Lightning trainer keyword arguments, not TrainerConfig fields. Pass them to fit(...) when needed.
Optimizer registry
optimizer_type resolves through a registry, so any name that is not a built-in torch.optim class (or previously registered) raises
InvalidParamError immediately with a list of valid options.
from deeptab.training.optimizers import available_optimizers, register_optimizer
print(available_optimizers())
# ['adadelta', 'adagrad', 'adam', 'adamax', 'adamw', 'asgd', ...]
# Register a third-party optimizer
register_optimizer("muon", MyMuonOptimizer)
tc = TrainerConfig(optimizer_type="muon", lr=1e-3)
Scheduler registry
scheduler_type resolves through a parallel registry.
from deeptab.training.schedulers import available_schedulers, register_scheduler
print(available_schedulers())
# ['constantlr', 'cosineannealinglr', 'cosineannealingwarmrestarts', ...]
# Switch to cosine annealing
tc = TrainerConfig(
scheduler_type="CosineAnnealingLR",
scheduler_kwargs={"T_max": 100, "eta_min": 1e-6},
)
# Disable the scheduler entirely
tc = TrainerConfig(scheduler_type=None)
Important
monitor and mode are forwarded to both early stopping and the LR
scheduler, so they are always aligned. Previously ReduceLROnPlateau always
watched val_loss in min mode regardless of what early stopping was
configured to use.
Registry lifecycle
The optimizer, scheduler, and loss registries are plain in-memory dictionaries that live for the lifetime of the Python process. DeepTab fills them with its built-in entries at import time, and any name you add joins the same process-global table.
Stage |
Optimizer / scheduler |
Loss |
Metric |
|---|---|---|---|
Register |
|
Subclass |
No registry API; pass metric instances to |
Look up |
|
|
|
Re-register same name |
Raises |
Silently replaces the previous class |
Not applicable |
Deregister |
|
No deregister API |
Not applicable |
Process restart |
Built-ins return on import; your entries are gone |
Built-ins return on import; re-import yours |
Defaults rebuilt on import |
After you register, the name is usable immediately, everywhere that accepts an optimizer_type, scheduler_type, or loss_fct string, for the rest of that process:
from deeptab.training.optimizers import register_optimizer, available_optimizers
register_optimizer("muon", MyMuonOptimizer)
print("muon" in available_optimizers()) # True
TrainerConfig(optimizer_type="muon", lr=1e-3) # resolves now
Registering the same name again is where the registries differ. Optimizers and schedulers refuse to clobber an existing entry unless you opt in:
register_optimizer("muon", MyMuonOptimizer) # ValueError: already registered
register_optimizer("muon", MyMuonOptimizer, override=True) # OK, replaces the entry
A loss registers itself the moment its class body runs, so re-importing or redefining a BaseLoss subclass with the same name silently overwrites the earlier one. There is no override flag and no error:
from deeptab.training.losses import BaseLoss
class FocalLoss(BaseLoss, name="focal"): # replaces the built-in "focal" in this process
...
Deregistering applies only to optimizers and schedulers, and only to names you added. Built-ins are protected:
from deeptab.training.optimizers import unregister_optimizer
unregister_optimizer("muon") # removes your entry
unregister_optimizer("muon", missing_ok=True) # idempotent: no error if already gone
unregister_optimizer("adam") # ValueError: built-in, cannot be removed
Important
Nothing in any registry is persisted to disk. When the interpreter restarts, only DeepTab’s built-ins come back automatically at import; every custom optimizer, scheduler, or loss you registered must be registered again. Put your register_* calls (and your BaseLoss subclass definitions) in a module that is imported at the top of every training script, so they are present in each new process and in each worker when training with multiple processes (DDP).
Note
Metrics work differently: there is no register_metric function. METRIC_REGISTRY only holds the per-task default lists. To use a custom metric, subclass DeepTabMetric and pass an instance straight to evaluate(metrics={"my_metric": MyMetric()}); nothing is registered, so nothing needs cleanup.
Controlling the validation split
When you do not pass an explicit validation set, DeepTab holds one out from the training data. The split is governed by TrainerConfig fields, so the split policy lives in the same place as the rest of the training settings.
from deeptab.configs import TrainerConfig
trainer_config = TrainerConfig(
val_size=0.15, # fraction held out when no explicit validation set is passed
shuffle=True, # shuffle before splitting
stratify=True, # keep class proportions in the split (classification only)
)
Field |
Default |
Meaning |
|---|---|---|
|
|
Validation fraction used when no |
|
|
Shuffle before splitting; |
|
|
Stratify the split on |
The seed for the split comes from the estimator’s random_state (or the random_state you pass to fit()), so the same seed always reproduces the same partition.
Important
stratify applies to classification only. A continuous regression target cannot be stratified, so the flag is ignored there. With stratify=True (the default) a classification split keeps the class balance of the full set; set stratify=False to draw a purely random split, which is useful for very small or rare-class datasets where stratification would otherwise fail.
Note
When you provide your own X_val and y_val, no internal split happens at all, so val_size, shuffle, and stratify do not apply.
Observability Config
The three configs above describe the model and how it trains. A fourth, optional config, ObservabilityConfig, controls what gets recorded while training runs: lifecycle events, a per-run artifact directory, and output for experiment trackers such as TensorBoard or MLflow. It is opt-in, so an estimator built without one trains exactly as before and emits nothing.
from deeptab.core.observability import ObservabilityConfig
from deeptab.models import MambularClassifier
model = MambularClassifier(
model_config=MambularConfig(d_model=64, n_layers=4),
observability_config=ObservabilityConfig(
experiment_name="churn_baseline",
structured_logging=True,
experiment_trackers=["tensorboard"],
),
)
Note
ObservabilityConfig lives in deeptab.core.observability, not deeptab.configs, because it records training rather than defining the model recipe. Unlike the three configs above it is excluded from get_params() and sklearn.clone, so it never takes part in hyperparameter search. The Observability guide has the full field reference, the run-directory layout, and the verbosity levels.
Using Configs Together
from deeptab.configs import MambularConfig, PreprocessingConfig, TrainerConfig
from deeptab.models import MambularClassifier
model = MambularClassifier(
model_config=MambularConfig(d_model=64, n_layers=4),
preprocessing_config=PreprocessingConfig(numerical_preprocessing="quantile"),
trainer_config=TrainerConfig(max_epochs=100, batch_size=128, lr=3e-4),
random_state=101,
)
model.fit(X_train, y_train)
If trainer_config is provided, fit() takes its max_epochs, batch_size, val_size, shuffle, stratify, patience, monitor, mode, and checkpoint_path, overriding the matching fit() arguments.
Hyperparameter Search
DeepTab estimators expose nested config fields with scikit-learn’s double-underscore syntax.
from sklearn.model_selection import GridSearchCV
from deeptab.configs import MambularConfig, PreprocessingConfig, TrainerConfig
from deeptab.models import MambularClassifier
estimator = MambularClassifier(
model_config=MambularConfig(),
preprocessing_config=PreprocessingConfig(),
trainer_config=TrainerConfig(max_epochs=30, patience=5),
)
param_grid = {
"model_config__d_model": [32, 64, 128],
"model_config__n_layers": [2, 4],
"trainer_config__lr": [1e-3, 3e-4],
"preprocessing_config__numerical_preprocessing": ["standardization", "quantile"],
}
search = GridSearchCV(estimator, param_grid=param_grid, cv=3, n_jobs=1)
search.fit(X_train, y_train)
Use n_jobs=1 for GPU experiments unless you intentionally manage multiple processes and devices.
Inspecting and Updating Parameters
cfg = MambularConfig(d_model=64)
print(cfg.get_params(deep=False))
cfg.set_params(d_model=128, n_layers=6)
On estimators:
model = MambularClassifier(
model_config=MambularConfig(),
preprocessing_config=PreprocessingConfig(),
trainer_config=TrainerConfig(),
)
model.set_params(model_config__d_model=128, trainer_config__lr=1e-3)
Practical Guidance
Start with a small model and explicit trainer settings. Add preprocessing and architecture search only after the baseline runs end to end.
Use
TrainerConfig(max_epochs=30, patience=5)for quick checks.Tune
lrandbatch_sizebefore deep architecture sweeps.Keep preprocessing choices in
PreprocessingConfigso experiments are reproducible.Save the three configs with experiment results; they are the primary recipe for reproducing a model.