Tangos
Tangos is an MLP-style tabular model with a gradient-attribution regularizer. It encourages hidden units to become specialized and diverse by penalizing latent-unit attributions with respect to input features.
Warning
Experimental model: Tangos is not covered by stable-model semantic versioning. Pin the exact DeepTab version for reproducible experiments.
Overview
Tangos is not a custom optimizer in the current DeepTab implementation. It is a feedforward network trained with the normal DeepTab optimizer, plus an additional penalty computed from the Jacobian of hidden representations with respect to input features.
The research hypothesis is that tabular MLPs generalize better when hidden units:
specialize on a sparse subset of input features, and
avoid learning highly overlapping feature attributions.
Property |
DeepTab Tangos |
|---|---|
Base architecture |
MLP |
Additional mechanism |
Jacobian-based specialization and orthogonalization penalty |
Training hook |
|
Main cost driver |
|
Best baseline comparisons |
MLP, ResNet, TabM |
Architectural Details
The forward path is a standard dense network:
raw preprocessed features
|
Linear -> activation -> dropout
|
Linear -> activation -> dropout
|
...
|
Linear output head
During training, Tangos computes a representation Jacobian:
[ J_{h,x} = \frac{\partial h(x)}{\partial x} ]
where (h(x)) is the representation before the final output head. The model builds latent-unit attribution vectors from this Jacobian and adds:
a specialization term, based on the L1 norm of neuron attributions, and
an orthogonality term, based on cosine similarity between attribution vectors of different hidden units.
The training loss is:
[ \mathcal{L}{total} = \mathcal{L}{task} + \lambda_1 \mathcal{L}{spec} + \lambda_2 \mathcal{L}{orth} ]
Main Building Blocks
The implementation lives in deeptab/architectures/experimental/tangos.py.
Component |
Implementation |
Role |
|---|---|---|
Dense body |
|
Learns tabular representation |
Optional GLU |
|
Gated dense transformations |
Optional skip connections |
Shape-matched residual additions |
Stabilizes deeper MLPs |
Representation function |
|
Hidden representation used for Jacobian attribution |
Jacobian computation |
|
Computes per-sample hidden-unit attributions |
Specialization loss |
L1 norm of attribution tensor |
Encourages sparse feature usage |
Orthogonality loss |
Cosine similarity between neuron attributions |
Encourages diverse hidden units |
Output head |
|
Task prediction |
Configuration
Parameter |
Default |
Practical Effect |
|---|---|---|
|
|
Width/depth of the MLP body |
|
|
Standard dropout regularization |
|
|
Hidden activation |
|
|
Enables gated linear units |
|
|
Adds residual connections when shapes match |
|
inherited default |
Optional batch normalization |
|
inherited default |
Optional layer normalization |
|
|
Weight for specialization penalty |
|
|
Weight for orthogonality penalty |
|
|
Fraction used for regularization pair sampling |
from deeptab.configs import PreprocessingConfig, TangosConfig, TrainerConfig
from deeptab.models.experimental import TangosRegressor
model = TangosRegressor(
model_config=TangosConfig(
layer_sizes=[256, 128, 32],
dropout=0.2,
lamda1=0.5,
lamda2=0.1,
subsample=0.5,
),
preprocessing_config=PreprocessingConfig(numerical_preprocessing="standard"),
trainer_config=TrainerConfig(lr=1e-3, batch_size=128, max_epochs=100),
random_state=101,
)
Practical Guide
Dataset Condition |
Recommendation |
|---|---|
Small or noisy data |
Try Tangos against MLP/ResNet; the regularizer may help |
Very high feature count |
Watch Jacobian memory and runtime |
Large batch sizes |
Reduce batch size if Jacobian computation is slow or memory-heavy |
Need fast training |
Prefer MLP, ResNet, or TabM |
Want attribution diversity analysis |
Tangos is a useful research model |
Suggested search space:
param_grid = {
"preprocessing_config__numerical_preprocessing": ["standard", "quantile"],
"model_config__layer_sizes": [[128, 64], [256, 128, 32], [512, 256, 128]],
"model_config__dropout": [0.0, 0.1, 0.2, 0.3],
"model_config__lamda1": [0.1, 0.5, 1.0],
"model_config__lamda2": [0.01, 0.1, 0.5],
"model_config__subsample": [0.25, 0.5],
"trainer_config__lr": [3e-4, 1e-3],
"trainer_config__batch_size": [64, 128, 256],
}
Nuances and Limitations
The penalty is computed only because
Tangosimplementspenalty_forward; DeepTab’s training module adds the penalty to task loss automatically.lamda1andlamda2are not learning rates. They are regularization weights.The Jacobian-based penalty can be substantially more expensive than a plain MLP forward/backward pass.
The implementation concatenates preprocessed raw feature tensors directly; it does not currently use
EmbeddingLayerin the active forward path.subsamplecontrols regularization estimation cost and variance. Report it in experiments.
When to Use
Use Tangos when the research question is about MLP regularization, feature-attribution structure, or hidden-unit specialization. Prefer MLP/ResNet/TabM when you need a fast production candidate or a strong simple baseline.
References
Jeffares, A., Liu, T., Crabbé, J., Imrie, F., & van der Schaar, M. (2023). TANGOS: Regularizing Tabular Neural Networks through Gradient Orthogonalization and Specialization. ICLR 2023. arXiv:2303.05506
See Also
MLP - stable dense baseline
ResNet - stable residual dense baseline
TabM - parameter-efficient ensemble baseline
Model Tiers - experimental vs stable models