Tangos

Tangos is an MLP-style tabular model with a gradient-attribution regularizer. It encourages hidden units to become specialized and diverse by penalizing latent-unit attributions with respect to input features.

Warning

Experimental model: Tangos is not covered by stable-model semantic versioning. Pin the exact DeepTab version for reproducible experiments.

Overview

Tangos is not a custom optimizer in the current DeepTab implementation. It is a feedforward network trained with the normal DeepTab optimizer, plus an additional penalty computed from the Jacobian of hidden representations with respect to input features.

The research hypothesis is that tabular MLPs generalize better when hidden units:

specialize on a sparse subset of input features, and
avoid learning highly overlapping feature attributions.

Property	DeepTab Tangos
Base architecture	MLP
Additional mechanism	Jacobian-based specialization and orthogonalization penalty
Training hook	`penalty_forward`
Main cost driver	`torch.func.jacrev` / Jacobian computation
Best baseline comparisons	MLP, ResNet, TabM

Architectural Details

The forward path is a standard dense network:

raw preprocessed features
    |
Linear -> activation -> dropout
    |
Linear -> activation -> dropout
    |
...
    |
Linear output head

During training, Tangos computes a representation Jacobian:

[ J_{h,x} = \frac{\partial h(x)}{\partial x} ]

where (h(x)) is the representation before the final output head. The model builds latent-unit attribution vectors from this Jacobian and adds:

a specialization term, based on the L1 norm of neuron attributions, and
an orthogonality term, based on cosine similarity between attribution vectors of different hidden units.

The training loss is:

[ \mathcal{L}{total} = \mathcal{L}{task} + \lambda_1 \mathcal{L}{spec} + \lambda_2 \mathcal{L}{orth} ]

Main Building Blocks

The implementation lives in deeptab/architectures/experimental/tangos.py.

Component	Implementation	Role
Dense body	`nn.ModuleList` of linear, normalization, activation, dropout layers	Learns tabular representation
Optional GLU	`nn.GLU()` when `use_glu=True`	Gated dense transformations
Optional skip connections	Shape-matched residual additions	Stabilizes deeper MLPs
Representation function	`repr_forward`	Hidden representation used for Jacobian attribution
Jacobian computation	`torch.func.vmap(torch.func.jacrev(...))`	Computes per-sample hidden-unit attributions
Specialization loss	L1 norm of attribution tensor	Encourages sparse feature usage
Orthogonality loss	Cosine similarity between neuron attributions	Encourages diverse hidden units
Output head	`nn.Linear(last_hidden, num_classes)`	Task prediction

Configuration

Parameter	Default	Practical Effect
`layer_sizes`	`[256, 128, 32]`	Width/depth of the MLP body
`dropout`	`0.2`	Standard dropout regularization
`activation`	`nn.ReLU()`	Hidden activation
`use_glu`	`False`	Enables gated linear units
`skip_connections`	`False`	Adds residual connections when shapes match
`batch_norm`	inherited default `False`	Optional batch normalization
`layer_norm`	inherited default `False`	Optional layer normalization
`lamda1`	`0.5`	Weight for specialization penalty
`lamda2`	`0.1`	Weight for orthogonality penalty
`subsample`	`0.5`	Fraction used for regularization pair sampling

from deeptab.configs import PreprocessingConfig, TangosConfig, TrainerConfig
from deeptab.models.experimental import TangosRegressor

model = TangosRegressor(
    model_config=TangosConfig(
        layer_sizes=[256, 128, 32],
        dropout=0.2,
        lamda1=0.5,
        lamda2=0.1,
        subsample=0.5,
    ),
    preprocessing_config=PreprocessingConfig(numerical_preprocessing="standard"),
    trainer_config=TrainerConfig(lr=1e-3, batch_size=128, max_epochs=100),
    random_state=101,
)

Practical Guide

Dataset Condition	Recommendation
Small or noisy data	Try Tangos against MLP/ResNet; the regularizer may help
Very high feature count	Watch Jacobian memory and runtime
Large batch sizes	Reduce batch size if Jacobian computation is slow or memory-heavy
Need fast training	Prefer MLP, ResNet, or TabM
Want attribution diversity analysis	Tangos is a useful research model

Suggested search space:

param_grid = {
    "preprocessing_config__numerical_preprocessing": ["standard", "quantile"],
    "model_config__layer_sizes": [[128, 64], [256, 128, 32], [512, 256, 128]],
    "model_config__dropout": [0.0, 0.1, 0.2, 0.3],
    "model_config__lamda1": [0.1, 0.5, 1.0],
    "model_config__lamda2": [0.01, 0.1, 0.5],
    "model_config__subsample": [0.25, 0.5],
    "trainer_config__lr": [3e-4, 1e-3],
    "trainer_config__batch_size": [64, 128, 256],
}

Nuances and Limitations

The penalty is computed only because Tangos implements penalty_forward; DeepTab’s training module adds the penalty to task loss automatically.
lamda1 and lamda2 are not learning rates. They are regularization weights.
The Jacobian-based penalty can be substantially more expensive than a plain MLP forward/backward pass.
The implementation concatenates preprocessed raw feature tensors directly; it does not currently use EmbeddingLayer in the active forward path.
subsample controls regularization estimation cost and variance. Report it in experiments.

When to Use

Use Tangos when the research question is about MLP regularization, feature-attribution structure, or hidden-unit specialization. Prefer MLP/ResNet/TabM when you need a fast production candidate or a strong simple baseline.

References

Jeffares, A., Liu, T., Crabbé, J., Imrie, F., & van der Schaar, M. (2023). TANGOS: Regularizing Tabular Neural Networks through Gradient Orthogonalization and Specialization. ICLR 2023. arXiv:2303.05506