AutoInt
Overview
AutoInt learns feature interactions with stacked multi-head self-attention layers. It treats tabular columns as feature tokens, repeatedly attends across tokens, flattens the final token sequence, and predicts with a linear head.
Use AutoInt when the main research question is automatic feature interaction learning rather than full Transformer encoder modeling.
Architectural Details
DeepTab’s AutoInt implementation uses:
EmbeddingLayerto create a(batch, n_features, d_model)token sequence.A stack of
n_layersattention interaction layers.Each layer applies
LayerNorm,nn.MultiheadAttention, a residual connection, a linear projection, and a second residual connection.The final token sequence is flattened and passed to a linear output head.
feature tokens -> [LayerNorm -> MultiheadAttention -> residual -> Linear -> residual] x n_layers -> flatten -> Linear
Main Building Blocks
Component |
DeepTab implementation |
Role |
|---|---|---|
Tokenizer |
|
Builds feature tokens. |
Interaction layer |
|
Learns pairwise and higher-order token interactions. |
Residual projection |
|
Updates each attended token. |
Output head |
|
Uses all token states for prediction. |
Implementation Notes
AutoIntConfig exposes kv_compression and kv_compression_sharing, and the architecture constructs compression layers. In the current DeepTab forward path, those compression layers are not applied to the attention call; the runtime behavior is standard multi-head self-attention over all feature tokens.
The config field is named fprenorm, while the architecture checks prenorm for last_norm. Unless this is aligned in code, the final optional normalization path is effectively inactive with the default config field name.
Practical Config
from deeptab.configs import AutoIntConfig, PreprocessingConfig, TrainerConfig
from deeptab.models import AutoIntClassifier
model = AutoIntClassifier(
model_config=AutoIntConfig(
d_model=128,
n_layers=4,
n_heads=8,
attn_dropout=0.2,
),
preprocessing_config=PreprocessingConfig(numerical_preprocessing="quantile"),
trainer_config=TrainerConfig(lr=3e-4, batch_size=128, max_epochs=100),
random_state=101,
)
Key settings:
Setting |
Typical range |
Effect |
|---|---|---|
|
|
Token width. |
|
|
Number of interaction layers. |
|
|
Attention heads; must divide |
|
|
Attention regularization. |
|
Present in config |
Not used by the current |
When To Use
Use AutoInt for attention-based feature interaction studies and as a lighter alternative to full Transformer encoders. Prefer FTTransformer when you need a feed-forward Transformer block and sequence pooling.
References
Song et al., AutoInt: Automatic Feature Interaction Learning via Self-Attentive Neural Networks.
Vaswani et al., Attention Is All You Need.