Overview
DeepTab is a library for deep learning on tabular data with a scikit-learn compatible interface. It handles feature preprocessing, batching, and the training loop, so the workflow stays fit, predict, and evaluate instead of hand-written PyTorch code and data loaders. Every architecture supports three tasks: classification, regression, and distributional regression for uncertainty quantification.
What is DeepTab?
DeepTab provides 15 stable neural architectures for tabular data:
Family |
Models |
Notes |
|---|---|---|
State Space Models |
Mambular, MambaTab, MambAttention |
Mamba-inspired; linear feature-sequence scaling |
Transformers |
FTTransformer, TabTransformer, SAINT, AutoInt |
Feature, row, and self-attention over feature interactions |
Residual networks |
ResNet, TabR |
Skip-connection MLP and retrieval-augmented |
Tree-inspired |
NODE, ENODE, NDTF |
Differentiable soft-tree structures |
General baselines |
MLP, TabM, TabulaRNN |
Dense, parameter-efficient ensemble, and recurrent |
Plus 3 experimental models: ModernNCA, Tangos, Trompt
Example:
from deeptab.models import FTTransformerClassifier
model = FTTransformerClassifier()
model.fit(X_train, y_train, max_epochs=100)
predictions = model.predict(X_test)
metrics = model.evaluate(X_test, y_test)
One model, three tasks
Every architecture comes in three variants. Change the suffix to change the task:
Class |
Task |
Output |
|---|---|---|
|
Classification |
Labels and probabilities |
|
Regression |
Continuous values |
|
Distributional regression |
Distribution parameters |
from deeptab.models import MambularClassifier, MambularRegressor, MambularLSS
clf = MambularClassifier() # labels and probabilities
reg = MambularRegressor() # point estimates
lss = MambularLSS() # full predictive distribution
The interface is identical across all three, so you can move between tasks, or swap architectures, without rewriting your pipeline.
Design Philosophy
Familiar Interface
If you know scikit-learn, you know DeepTab. Standard fit/predict API with seamless integration:
from sklearn.model_selection import GridSearchCV
from deeptab.models import FTTransformerClassifier
search = GridSearchCV(FTTransformerClassifier(), param_grid, cv=5)
search.fit(X, y)
Defaults and Configuration
With no configuration, DeepTab detects feature types, encodes and scales them, and imputes missing values during preprocessing. Training runs on a GPU when one is available, with early stopping and checkpointing enabled.
Each layer is configurable when you need it. Architecture, preprocessing, and training settings live in separate config objects, so you can change one without touching the others:
from deeptab.configs import ResNetConfig, PreprocessingConfig, TrainerConfig
from deeptab.models import ResNetClassifier
model = ResNetClassifier(
model_config=ResNetConfig(d_model=128),
preprocessing_config=PreprocessingConfig(numerical_preprocessing="quantile"),
trainer_config=TrainerConfig(lr=1e-3, batch_size=256),
)
Built for real datasets
Beyond the defaults above, DeepTab handles details that come up with real data:
Mixed numerical, categorical, and precomputed embedding features in a single model
Automatic stratified splits for classification, preserving class proportions
Mini-batch training that scales to datasets larger than a single batch
Multi-device and other Lightning training strategies, enabled by forwarding trainer arguments to
fit()
When to Use DeepTab
DeepTab and gradient-boosted trees (XGBoost, LightGBM, CatBoost) are complementary tools. The table below shows where each tends to be the stronger starting point.
Situation |
Stronger starting point |
|---|---|
Mixed numerical and categorical features |
DeepTab or gradient-boosted trees |
A few thousand samples or more |
DeepTab |
Complex feature interactions |
DeepTab |
Calibrated uncertainty through distributional regression |
DeepTab ( |
Classification, regression, and distributional models behind one interface |
DeepTab |
Integration with scikit-learn pipelines and |
DeepTab or gradient-boosted trees |
Small datasets where overfitting is a risk |
Gradient-boosted trees |
Data that does not fit in memory |
Gradient-boosted trees |
Latency-critical inference |
Gradient-boosted trees |
Note
These are general guidelines, not strict rules. Results depend on the dataset, so when performance matters it is worth benchmarking DeepTab against a gradient-boosted baseline.
Next Steps
Installation: Set up in a couple of minutes
Quickstart: Train your first model in a few minutes
Tutorials: End-to-end workflows
FAQ: Common questions