TabR
Overview
TabR is a retrieval-augmented tabular model. It encodes the current row and candidate training rows into a latent space, retrieves nearest candidate contexts with FAISS, mixes candidate labels into the representation, and predicts with a neural head.
Use TabR when local neighborhood structure is likely to matter and you can afford train-set candidate retrieval during training, validation, and prediction.
Architectural Details
DeepTab’s TabR implementation has three conceptual modules:
Encoder (
E): project input features tod_mainand optionally apply residual MLP encoder blocks.Retrieval (
R): compute keys withK, search nearest candidate keys using FAISS, encode candidate labels, and compute attention-like weights over contexts.Predictor (
P): combine retrieved context with the query representation and apply residual predictor blocks plus a normalized output head.
query features -> encoder -> key
candidate features -> encoder -> candidate keys -> FAISS nearest neighbors
candidate labels + key differences -> retrieved context -> predictor -> output
Main Building Blocks
Component |
DeepTab implementation |
Role |
|---|---|---|
Optional tokenizer |
|
Embeds features before retrieval when |
Encoder |
|
Builds row representation and retrieval key. |
Candidate search |
|
Retrieves nearest candidate keys. |
Label encoder |
|
Converts candidate labels to vectors. |
Context transform |
|
Adjusts retrieved values by query-context difference. |
Predictor |
|
Produces task output. |
Implementation Notes
TabR sets uses_candidates=True, so it has specialized candidate-aware training, validation, and prediction methods. The standard forward method exists for baseline compatibility, but proper TabR behavior depends on candidate data.
The implementation lazily imports delu and faiss. Install the appropriate FAISS package for your hardware before using TabR in experiments.
Practical Config
from deeptab.configs import PreprocessingConfig, TabRConfig, TrainerConfig
from deeptab.models import TabRRegressor
model = TabRRegressor(
model_config=TabRConfig(
d_main=256,
context_size=96,
predictor_n_blocks=1,
encoder_n_blocks=0,
context_dropout=0.2,
memory_efficient=False,
),
preprocessing_config=PreprocessingConfig(numerical_preprocessing="quantile"),
trainer_config=TrainerConfig(lr=3e-4, batch_size=128, max_epochs=100),
random_state=101,
)
Key settings:
Setting |
Typical range |
Effect |
|---|---|---|
|
|
Retrieval and predictor representation width. |
|
|
Number of neighbors used per query. |
|
|
Query/candidate encoder depth. |
|
|
Post-retrieval predictor depth. |
|
|
Chunked candidate encoding. |
|
|
Reduces memory at extra compute cost. |
When To Use
Use TabR when nearest-neighbor information is a serious baseline, especially on datasets with local smoothness, repeated profiles, or label neighborhoods. Account for retrieval cost and candidate-set leakage rules in experimental protocols.
References
Gorishniy et al., TabR: Tabular Deep Learning Meets Nearest Neighbors.
Cover and Hart, Nearest Neighbor Pattern Classification.