NODE
Overview
NODE implements Neural Oblivious Decision Ensembles: differentiable oblivious decision trees trained inside a neural network. It is a useful bridge between tree-based inductive bias and gradient-based deep learning.
Use NODE when you want soft tree-like feature partitioning while keeping the sklearn-style DeepTab training interface.
Architectural Details
DeepTab’s NODE pipeline is:
Use raw/preprocessed concatenated features, or optionally embed features and flatten them.
Pass the vector through a
DenseBlockof differentiable oblivious trees.Flatten the dense block output.
Predict with
MLPhead.
features -> optional embeddings -> DenseBlock(num_layers, layer_dim, depth, tree_dim) -> MLPhead
Main Building Blocks
Component |
DeepTab implementation |
Role |
|---|---|---|
Input representation |
raw concatenation or |
Builds the vector consumed by trees. |
Differentiable trees |
|
Stacks NODE-style tree layers. |
Tree depth |
|
Controls number of soft splits per tree. |
Layer width |
|
Number of trees/features per dense layer. |
Head |
|
Maps tree representation to task output. |
Implementation Notes
num_layers * layer_dim determines the input dimension to the prediction head. Larger values increase capacity and memory use. tree_dim controls the output dimension per tree.
Practical Config
from deeptab.configs import NODEConfig, PreprocessingConfig, TrainerConfig
from deeptab.models import NODEClassifier
model = NODEClassifier(
model_config=NODEConfig(
num_layers=4,
layer_dim=128,
depth=6,
tree_dim=1,
head_dropout=0.3,
),
preprocessing_config=PreprocessingConfig(numerical_preprocessing="quantile"),
trainer_config=TrainerConfig(lr=1e-3, batch_size=256, max_epochs=100),
random_state=101,
)
Key settings:
Setting |
Typical range |
Effect |
|---|---|---|
|
|
Number of dense tree layers. |
|
|
Width of each tree layer. |
|
|
Soft decision depth. |
|
|
Output dimension per tree. |
|
|
Extra prediction-head capacity. |
When To Use
Use NODE when you want a differentiable tree ensemble baseline. Compare it with gradient-boosted trees and neural MLP/ResNet baselines because tree-like inductive bias can dominate or underperform depending on preprocessing and dataset size.