ModernNCA
ModernNCA is a differentiable nearest-neighbor model for tabular data. It learns a neural representation of each row, compares query rows to candidate rows in that representation space, and predicts by a softmax-weighted average of candidate labels.
Warning
Experimental model: ModernNCA is not covered by stable-model semantic versioning. Pin the exact DeepTab version for reproducible experiments.
Overview
ModernNCA revisits Neighborhood Component Analysis (NCA) with modern tabular deep-learning components. In DeepTab, it is implemented as a candidate-based model:
Encode each row into a learned representation.
Compute Euclidean distances from batch rows to candidate rows.
Convert negative distances into weights with a temperature-scaled softmax.
Predict by weighting candidate labels.
This makes ModernNCA useful when the target function is locally smooth in a representation space: rows with similar learned embeddings should have similar labels.
Property |
DeepTab ModernNCA |
|---|---|
Inductive bias |
Local similarity / soft nearest-neighbor prediction |
Prediction form |
Weighted candidate labels |
Training mode |
Candidate-aware via |
Inference cost |
Pairwise distance to candidate rows |
Best baseline comparisons |
TabR, TabM, ResNet, MLP |
Architectural Details
For a query row (x_i) and candidate rows ({x_j, y_j}), ModernNCA learns an encoder (\phi_\theta):
raw features
|
optional DeepTab feature embeddings
|
linear encoder: input_dim -> dim
|
residual post-encoder blocks
|
embedding z = phi(x)
Distances are converted to candidate weights:
[ d_{ij} = \frac{|\phi_\theta(x_i) - \phi_\theta(x_j)|_2}{T} ]
[ w_{ij} = \mathrm{softmax}j(-d{ij}) ]
For regression, the output is the weighted average of candidate targets. For classification, candidate labels are one-hot encoded and the weighted class probabilities are log-transformed before loss computation.
During training, DeepTab concatenates the current batch with a sampled subset of training candidates. The diagonal self-match for the current batch is masked to avoid a row predicting from its own label.
Main Building Blocks
The implementation lives in deeptab/architectures/experimental/modern_nca.py.
Component |
Implementation |
Role |
|---|---|---|
Optional feature embedding |
|
Converts raw columns into per-feature representations |
Encoder |
|
Projects the flattened row into metric space |
Post-encoder |
Repeated BatchNorm -> Linear -> ReLU -> Dropout -> Linear blocks |
Adds nonlinear representation capacity |
Candidate weighting |
|
Differentiable neighbor weighting |
Candidate prediction |
Matrix multiply between weights and candidate labels |
Produces regression values or class probabilities |
Fallback head |
|
Allows non-candidate forward compatibility |
Configuration
Parameter |
Default |
Practical Effect |
|---|---|---|
|
|
Metric-space dimension after the encoder |
|
|
Hidden width inside residual post-encoder blocks |
|
|
Number of post-encoder blocks |
|
|
Regularization inside post-encoder blocks |
|
|
Softmax sharpness for candidate weighting |
|
|
Fraction of candidate rows sampled during training |
|
|
Default embedding type when embeddings are enabled |
|
|
PLR frequency count |
|
|
PLR initialization scale |
from deeptab.configs import ModernNCAConfig, PreprocessingConfig, TrainerConfig
from deeptab.models.experimental import ModernNCAClassifier
model = ModernNCAClassifier(
model_config=ModernNCAConfig(
dim=128,
d_block=512,
n_blocks=4,
dropout=0.1,
temperature=0.75,
sample_rate=0.5,
),
preprocessing_config=PreprocessingConfig(numerical_preprocessing="quantile"),
trainer_config=TrainerConfig(lr=3e-4, batch_size=128, max_epochs=100),
random_state=101,
)
Practical Guide
Dataset Condition |
Recommendation |
|---|---|
Small to medium data |
ModernNCA is worth testing; candidate distance cost is manageable |
Very large candidate pool |
Reduce |
Noisy labels |
Increase |
Strong local clusters |
ModernNCA may be competitive with retrieval models |
Latency-sensitive inference |
Prefer MLP/ResNet/TabM unless candidate search is acceptable |
Suggested search space:
param_grid = {
"preprocessing_config__numerical_preprocessing": ["standard", "quantile", "ple"],
"model_config__dim": [64, 128, 256],
"model_config__n_blocks": [2, 4, 6],
"model_config__d_block": [256, 512],
"model_config__dropout": [0.0, 0.1, 0.2],
"model_config__temperature": [0.5, 0.75, 1.0],
"model_config__sample_rate": [0.25, 0.5, 1.0],
"trainer_config__lr": [1e-4, 3e-4, 5e-4],
}
Nuances and Limitations
Candidate construction matters. Validation and test rows should retrieve from training candidates, not from labels that would leak evaluation information.
sample_ratechanges the stochastic training objective. Report it in benchmarks.temperaturecontrols the effective number of neighbors. Lower values make predictions closer to nearest-neighbor behavior.Pairwise distance computation is the dominant cost: roughly (O(B \cdot N_c \cdot dim)) for batch size (B) and candidate count (N_c).
Compared with TabR, ModernNCA uses a simpler soft NCA-style label aggregation rather than TabR’s learned context/value transformation.
When to Use
Use ModernNCA when your hypothesis is that local neighborhoods in a learned representation space carry strong signal. Prefer TabM, ResNet, Mambular, or FTTransformer when you want a purely parametric model with simpler inference.
References
Goldberger, J., Roweis, S., Hinton, G., & Salakhutdinov, R. (2004). Neighbourhood Components Analysis. NeurIPS 2004.
Ye, H.-J., Yin, H.-H., Zhan, D.-C., & Chao, W.-L. (2025). Revisiting Nearest Neighbor for Tabular Data: A Deep Tabular Baseline Two Decades Later. ICLR 2025. OpenReview
Weinberger, K. Q., & Saul, L. K. (2009). Distance Metric Learning for Large Margin Nearest Neighbor Classification. JMLR.
See Also
TabR - stable retrieval-augmented tabular model
Recommended Configs - general tuning strategy
Model Tiers - experimental vs stable models