Distributional Regression#

Distributional regression predicts the full conditional distribution of the target rather than a single point estimate. This is useful when you need uncertainty estimates or when the target distribution is asymmetric or heavy-tailed.

Setup#

import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split

from deeptab.models import MambularLSS

Generate data#

np.random.seed(42)

n_samples, n_features = 1000, 5
X = np.random.randn(n_samples, n_features)
y = np.dot(X, np.random.randn(n_features)) + np.random.randn(n_samples)

df = pd.DataFrame(X, columns=[f"feature_{i}" for i in range(n_features)])
df["target"] = y

Split#

X = df.drop(columns=["target"])
y = df["target"].values

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

Train#

Pass family to specify the output distribution. Use "normal" for continuous symmetric targets. Other supported families include "poisson", "gamma", "beta", and more.

model = MambularLSS()
model.fit(X_train, y_train, family="normal", max_epochs=10)

Evaluate#

metrics = model.evaluate(X_test, y_test)
print(metrics)

Note

The family argument controls which distribution parameters the model learns. For count data try "poisson", for strictly positive targets try "gamma". See the API reference for the full list of supported families.

Using your own data#

import pandas as pd
from sklearn.model_selection import train_test_split
from deeptab.models import MambularLSS

df = pd.read_csv("your_data.csv")
X = df.drop(columns=["target"])
y = df["target"].values

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

model = MambularLSS()
model.fit(X_train, y_train, family="normal", max_epochs=50)
print(model.evaluate(X_test, y_test))

Next steps#