MLP
Overview
MLP is DeepTab’s plain feed-forward baseline for tabular data. It is the first model to include in most studies because it is fast, easy to tune, and makes very few assumptions beyond the quality of preprocessing and feature encoding.
Use it as a control model before moving to attention, retrieval, Mamba, or neural tree architectures. A well-tuned MLP is often competitive on medium-size tabular datasets, especially with good numerical scaling and categorical handling.
Architectural Details
DeepTab’s MLP implementation follows a simple pipeline:
Optionally embed numerical, categorical, and external embedding features with
EmbeddingLayer.Flatten embedded tokens to a single vector, or concatenate raw/preprocessed input tensors when
use_embeddings=False.Apply a sequence of linear layers from
layer_sizes.Optionally apply batch normalization, layer normalization, activation, GLU, dropout, and residual additions when dimensions match.
Project the final hidden representation to the task output dimension.
The forward path is:
features -> optional EmbeddingLayer -> flatten/concat -> Linear blocks -> output layer
Main Building Blocks
Component |
DeepTab implementation |
Role |
|---|---|---|
Feature input |
|
Builds the vector consumed by the MLP. |
Hidden stack |
|
Learns nonlinear feature interactions. |
Normalization |
|
Stabilizes training when enabled. |
Activation |
|
Controls nonlinear transformation. |
Skip connections |
|
Adds residual connections only when shapes match. |
Output head |
Final |
Produces logits or regression outputs. |
Implementation Notes
The default MLPConfig uses layer_sizes=[256, 128, 32] and dropout=0.2. The model does not require embeddings, so it works well with standard numerical preprocessing and integer/one-hot categorical preprocessing.
use_glu=True changes the hidden representation width because PyTorch nn.GLU halves the selected dimension. Use it only after checking layer dimensions, or prefer the default activation path for baseline experiments.
Practical Config
from deeptab.configs import MLPConfig, PreprocessingConfig, TrainerConfig
from deeptab.models import MLPClassifier
model = MLPClassifier(
model_config=MLPConfig(
layer_sizes=[256, 128, 32],
dropout=0.2,
skip_connections=False,
),
preprocessing_config=PreprocessingConfig(numerical_preprocessing="standard"),
trainer_config=TrainerConfig(lr=1e-3, batch_size=256, max_epochs=100),
random_state=101,
)
Key settings:
Setting |
Typical range |
Effect |
|---|---|---|
|
|
Main capacity control. |
|
|
Regularization; increase on small/noisy data. |
|
|
Enables feature token embeddings before flattening. |
|
|
Embedding width when embeddings are used. |
|
|
Try when optimization is unstable. |
When To Use
Use MLP when you need a fast sanity check, a strong non-attention baseline, or a low-latency model. It is also a useful ablation target for evaluating whether a more complex architecture is actually adding value.
Avoid treating it as a weak baseline. Many tabular benchmarks show that tuned MLP/ResNet-style models can be difficult to beat without careful preprocessing and hyperparameter search.
References
Gorishniy et al., Revisiting Deep Learning Models for Tabular Data.
Shwartz-Ziv and Armon, Tabular Data: Deep Learning is Not All You Need.