FAQ

Frequently asked questions about DeepTab and troubleshooting common issues.

General

What’s the difference between DeepTab v1 and v2?

Version 2.0 introduces a fully typed data layer (TabularDataset, TabularDataModule, FeatureSchema, TabularBatch) that makes it easier to work with tabular data at a lower level. The high-level estimator API remains unchanged and is still the recommended interface for most users.

Key changes in v2.0:

  • Automatic stratification for classification tasks

  • Typed batch containers with device management

  • Feature schema tracking with metadata

  • Consistent label shapes across tasks

  • Deprecated MambularDataset/MambularDataModule aliases (use TabularDataset/TabularDataModule)

Important

Note on v1 support: DeepTab v1 is no longer supported following the v2.0 release. The changes in package structure and API design were substantial enough that maintaining backward compatibility would have compromised the improvements in v2. If you’re using v1 in production, we recommend planning a migration to v2. Pin your dependency to deeptab<2.0 if you need to continue using v1, but be aware that no bug fixes or security updates will be provided for the v1 branch.

See the Overview for details on the new data API.

Which model should I use?

Tip

When in doubt, start with MambularClassifier or MambularRegressor.

Mambular tends to work well across a variety of tabular problems. For a full selection guide by dataset size, feature type, and compute constraints, see the Model Comparison page.

Quick pointers:

  • Strong general-purpose baselineTabM or Mambular

  • Many categorical featuresTabTransformer

  • Fastest baselineMLP or ResNet

  • Uncertainty estimates → any LSS variant

  • InterpretabilityNODE or NDTF

Do I need a GPU?

No, but it helps significantly for larger datasets and more complex architectures. The short answer:

  • MLP, ResNet, TabM, MambaTab: train comfortably on CPU up to ~100K to 500K rows.

  • Mambular, TabulaRNN, TabTransformer, NODE: CPU is fine up to ~10K to 20K rows; GPU recommended beyond that.

  • FTTransformer, AutoInt, MambAttention, ENODE, NDTF, TabR: GPU recommended above ~5K to 10K rows.

  • SAINT: GPU strongly recommended above ~2K rows (row attention makes every batch expensive).

For a full per-model breakdown including the cost driver for each architecture, see the Model Zoo Comparison Tables in the Model Zoo.

How do I know if my GPU is being used?

Check CUDA availability:

import torch
print(f"CUDA available: {torch.cuda.is_available()}")

DeepTab will automatically use the first available GPU. If CUDA is available but you’re not seeing speedups, ensure you’re training on a reasonably large dataset, since small batches may not benefit from GPU parallelism.

Can I use DeepTab with PyTorch dataloaders?

Note

The high-level API uses TabularDataModule internally, but you can access TabularDataset directly for custom data loading.

Yes. The internal TabularDataModule creates PyTorch DataLoader instances. If you need custom data loading logic, you can use TabularDataset directly:

from deeptab.data import TabularDataset
from torch.utils.data import DataLoader

dataset = TabularDataset(
    cat_feature_list=[...],
    num_feature_list=[...],
    embedding_feature_list=None,
    y=labels,
)

dataloader = DataLoader(dataset, batch_size=128, shuffle=True)

Data and preprocessing

What data types are supported?

DeepTab automatically handles:

  • Numerical: int, float dtypes

  • Categorical: object, category, bool dtypes

  • Embeddings: Pass pre-computed embeddings via the embeddings parameter of fit()

How do I handle missing values?

Tip

No manual imputation needed! DeepTab handles missing values automatically.

DeepTab handles missing values internally during preprocessing:

# DataFrame with missing values
df = pd.DataFrame({
    "age": [25, np.nan, 47, 51],
    "city": ["NYC", "Boston", None, "Chicago"],
})

# Works without manual imputation
model = MambularClassifier()
model.fit(df, y, max_epochs=50)

The pretab preprocessor (used internally) applies median imputation for numerical features and mode imputation for categoricals by default.

Can I use NumPy arrays instead of DataFrames?

Yes. DeepTab accepts both:

# NumPy arrays work
X = np.random.randn(1000, 10)
y = np.random.randint(0, 2, size=1000)

model = MambularClassifier()
model.fit(X, y, max_epochs=50)

However, DataFrames are recommended because they preserve column names and types, which helps with feature type detection and preprocessing.

How do I tell DeepTab which columns are categorical?

DeepTab infers feature types from DataFrame dtypes:

# Ensure categorical columns have the right dtype
df["city"] = df["city"].astype("category")
df["user_id"] = df["user_id"].astype("category")  # Numeric ID, but categorical

model = MambularClassifier()
model.fit(df, y, max_epochs=50)

If you’re using NumPy arrays, all features are treated as numerical by default.

What if I have text or image data?

DeepTab is designed for tabular data. For text or images:

  1. Use a pre-trained encoder to generate embeddings

  2. Pass embeddings via the embeddings parameter of fit()

from sentence_transformers import SentenceTransformer

# Encode text to embeddings
text_model = SentenceTransformer("all-MiniLM-L6-v2")
text_embeddings = text_model.encode(df["description"].tolist())

# Pass embeddings alongside tabular features
X_tabular = df.drop(columns=["description", "target"])
model = MambularClassifier()
model.fit(X_tabular, y, embeddings=text_embeddings, max_epochs=50)

Can I customize preprocessing per feature?

Not directly. PreprocessingConfig applies the same strategy to all numerical features. If you need per-feature preprocessing, apply it manually before passing to DeepTab:

# Custom preprocessing
df["log_income"] = np.log1p(df["income"])
df["age_binned"] = pd.cut(df["age"], bins=5).astype("category")

# Then fit DeepTab
model = MambularClassifier()
model.fit(df, y, max_epochs=50)

Training and performance

How do I speed up training?

Tip

Combine GPU acceleration with larger batch sizes and early stopping for fastest training.

Several options:

  1. Use a GPU: install CUDA-enabled PyTorch

  2. Increase batch size: larger batches are more efficient when memory allows (TrainerConfig(batch_size=...))

  3. Reduce epochs: rely on early stopping instead of a fixed epoch count

  4. Use multi-worker data loading: pass num_workers through dataloader_kwargs in fit()

from deeptab.configs import TrainerConfig

model = MambularClassifier(
    trainer_config=TrainerConfig(
        batch_size=512,   # Larger batch size
        patience=10,      # Early stopping
    )
)

# num_workers is a DataLoader option, so pass it via dataloader_kwargs
model.fit(X_train, y_train, dataloader_kwargs={"num_workers": 4}, max_epochs=100)

Training is slow on GPU

Note

GPUs need larger batch sizes to show a speedup over CPU. Small batches or datasets may run faster on CPU.

Ensure you’re using GPU:

import torch
print(torch.cuda.is_available())  # Should be True

If True but still slow:

  • Small batches: GPU efficiency requires larger batches (try 256+)

  • Small dataset: for < 1K samples, CPU may be faster due to transfer overhead

  • CPU bottleneck: increase num_workers via dataloader_kwargs in fit() for faster data loading

How do I use early stopping?

Early stopping is enabled by default. Adjust patience:

from deeptab.configs import TrainerConfig

model = MambularClassifier(
    trainer_config=TrainerConfig(
        patience=15,  # Stop if no improvement for 15 epochs
    )
)

Provide an explicit validation set for better early stopping:

model.fit(
    X_train, y_train,
    X_val=X_val, y_val=y_val,
    max_epochs=100,
)

How do I save a trained model?

Use the .deeptab extension. DeepTab warns when a different extension is used.

# Save
model.save("my_model.deeptab")

# Load
from deeptab.models import MambularClassifier
loaded = MambularClassifier.load("my_model.deeptab")
predictions = loaded.predict(X_test)

The artifact includes weights, fitted preprocessor, feature schema, and task metadata.

Can I resume training from a checkpoint?

Not directly through the estimator API. If you need this, consider using TabularDataModule with PyTorch Lightning’s checkpointing directly.

How do I monitor training metrics?

DeepTab shows a progress bar by default. For richer per-epoch metrics, pass train_metrics/val_metrics dicts to fit(), or attach an experiment tracker through ObservabilityConfig:

from deeptab.core.observability import ObservabilityConfig

model = MambularClassifier(
    observability_config=ObservabilityConfig(verbosity=2, experiment_trackers=["tensorboard"]),
)

For fully custom metrics, use Lightning callbacks (advanced usage, see the Lightning docs).

Errors and troubleshooting

CUDA out of memory

Warning

GPU memory errors usually indicate batch size is too large for your GPU.

Reduce batch size:

from deeptab.configs import TrainerConfig

model = MambularClassifier(
    trainer_config=TrainerConfig(batch_size=64)  # Smaller batch size
)

Or force CPU training by passing the Lightning accelerator to fit():

model = MambularClassifier()
model.fit(X_train, y_train, accelerator="cpu")

ValueError: could not convert string to float

Tip

This usually means categorical features weren’t properly detected. Explicitly set dtypes.

This happens when categorical features are not properly encoded. Ensure they have the right dtype:

df["city"] = df["city"].astype("category")

Or check for unexpected non-numeric values in numerical columns.

ImportError: No module named ‘deeptab’

Ensure DeepTab is installed in the active environment:

pip list | grep deeptab

If not listed:

pip install deeptab

AttributeError: ‘TabularDataModule’ object has no attribute ‘embedding_feature_info’

This was a bug in early v2.0 pre-releases. Upgrade to v2.0.0 or later:

pip install --upgrade deeptab

Training is unstable (loss explodes)

Warning

Exploding gradients indicate learning rate may be too high or data has extreme values.

Try reducing learning rate:

from deeptab.configs import TrainerConfig

model = MambularClassifier(
    trainer_config=TrainerConfig(lr=1e-4)  # Lower learning rate
)

Or enable gradient clipping, which is off by default. Pass it to fit() as a Lightning trainer argument:

model = MambularClassifier()
model.fit(X_train, y_train, gradient_clip_val=0.5)

RuntimeError: Expected all tensors to be on the same device

Note

The high-level estimator API handles device management automatically. This error typically occurs only with custom training loops.

Ensure all tensors are on the same device:

batch = batch.to("cuda")  # Move entire batch

The estimator API handles this automatically.

Model-specific

What’s the difference between Mambular and MambaTab?

Both use Mamba (State Space Model) blocks, but differ in how they process features:

  • Mambular: Sequential model. Processes features one at a time in sequence, learning dependencies between features.

  • MambaTab: Joint model. Applies Mamba to a concatenated representation of all features at once.

Mambular tends to work better for datasets where feature order matters or where you want to learn sequential dependencies.

When should I use distributional regression (LSS)?

Tip

Use LSS models when you need uncertainty estimates, not just point predictions.

Use LSS models when you need:

  • Uncertainty quantification: Know when predictions are confident vs uncertain

  • Prediction intervals: Generate confidence bounds (e.g., 95% intervals)

  • Heteroscedastic noise: Model varying noise levels across inputs

  • Risk-aware decisions: Use full distributions for downstream optimization

Example:

from deeptab.models import MambularLSS

model = MambularLSS()
model.fit(X_train, y_train, family="normal", max_epochs=50)

# Get mean and std for each prediction
params = model.predict(X_test)
mean = params[:, 0]
std = params[:, 1]

# 95% prediction interval
lower = mean - 1.96 * std
upper = mean + 1.96 * std

Can I use my own custom architecture?

Yes, but it requires subclassing BaseTaskModel. See the source code for examples of how to extend the base classes.

Do experimental models work the same way as stable models?

Yes, the API is identical. The only difference is that experimental models may change without a deprecation cycle:

from deeptab.models.experimental import TromptClassifier

# Same API as stable models
model = TromptClassifier()
model.fit(X_train, y_train, max_epochs=50)

Integration

Can I use DeepTab with scikit-learn pipelines?

Yes:

from sklearn.pipeline import Pipeline
from deeptab.models import MambularClassifier

pipeline = Pipeline([
    ("model", MambularClassifier()),
])
pipeline.fit(X_train, y_train)
predictions = pipeline.predict(X_test)

Note: DeepTab does its own preprocessing, so additional preprocessing steps in the pipeline may be redundant.

Does GridSearchCV work?

Yes:

from sklearn.model_selection import GridSearchCV

search = GridSearchCV(
    estimator=MambularClassifier(),
    param_grid={
        "model_config__d_model": [64, 128],
        "trainer_config__lr": [1e-3, 5e-4],
    },
    cv=5,
)
search.fit(X_train, y_train)

Note: Set n_jobs=1 in GridSearchCV if using GPU, as each model will try to use the GPU.

Can I deploy DeepTab models?

Yes. For deployment, use InferenceModel. It validates the input schema and exposes only the inference surface, preventing accidental retraining in production:

# Training environment
model.save("model.deeptab")

# Deployment environment
from deeptab import InferenceModel
model = InferenceModel.from_path("model.deeptab")

X_clean = model.validate_input(X_new)  # raises on schema mismatch
predictions = model.predict(X_clean)

See the Inference Model guide for the full deployment workflow.

Advanced usage

How do I access the underlying PyTorch model?

For most inspection needs, use the public helpers model.summary(), model.describe(), and model.parameter_table(). They work once the model is built or fitted and do not require touching internals.

model = MambularClassifier()
model.fit(X_train, y_train, max_epochs=50)

print(model.summary())        # human-readable overview
info = model.describe()       # structured dict (architecture, task, params, ...)

If you need direct access for advanced work, the fitted Lightning module lives in the private model._task_model attribute, and the raw nn.Module architecture is model._task_model.estimator. These are internal and may change between releases.

Can I use custom loss functions?

Not directly through the estimator API. If you need custom losses, use TabularDataModule with a custom Lightning module.

How do I extract learned features?

Access intermediate representations:

model = MambularClassifier()
model.fit(X_train, y_train, max_epochs=50)

# The raw architecture lives on the fitted Lightning module (internal API)
architecture = model._task_model.estimator

This is an advanced use case. See the source code for details.

Can I use multiple GPUs?

DeepTab uses the first available GPU by default. For multi-GPU training, use Lightning’s distributed strategies directly with TabularDataModule (advanced usage).

Contributing and support

How do I report a bug?

Open an issue on GitHub with:

  • DeepTab version (import deeptab; print(deeptab.__version__))

  • Python version

  • PyTorch version

  • Minimal reproducible example

  • Full error traceback

How do I request a feature?

Open a feature request on GitHub describing:

  • The use case

  • Why existing features don’t solve it

  • Proposed API (if applicable)

How do I contribute?

See the Contributing guide for:

  • Setting up the development environment

  • Running tests

  • Code style guidelines

  • Submitting pull requests

Where can I get help?

  • Check this FAQ first

  • Search GitHub issues

  • Open a new issue for bugs or questions

  • Join discussions on the GitHub repo

Performance comparisons

How does DeepTab compare to XGBoost?

It depends on the dataset:

  • Small datasets (< 1K samples): XGBoost often wins

  • Large datasets (> 10K samples): DeepTab competitive or better, especially with complex feature interactions

  • Categorical-heavy data: XGBoost may be more efficient

  • Need for uncertainty: DeepTab LSS models provide distributional predictions

Use both and compare on your specific data. DeepTab makes experimentation easy.

Is DeepTab faster than training PyTorch manually?

No, DeepTab uses PyTorch under the hood. It provides convenience, not speed improvements. However, it does:

  • Apply sensible defaults (early stopping, LR scheduling)

  • Handle device management automatically

  • Provide efficient data loading

So while not “faster”, it helps you get to a working model more quickly.

Still have questions?

If your question isn’t answered here:

  1. Check the Core Concepts guide

  2. Browse the Tutorials

  3. Search GitHub issues

  4. Open a new issue on GitHub