Observability: Logging, Tracking, and Run Directories
DeepTab can record everything that happens during training without you writing a single
callback. You attach an ObservabilityConfig to an estimator and every fit() captures its
hyperparameters, lifecycle events, and final metrics in one self-contained run directory.
Optional experiment trackers (TensorBoard, MLflow) and your own Lightning loggers build on the
same configuration.
This tutorial is deliberately exhaustive. We train the same model many times, changing one observability setting at a time, and after every run we show the resulting directory tree so you can see exactly what each setting produces on disk and on the console.
Note
The notebook linked above mirrors this tutorial. Use the markdown page for reading; use the notebook when you want to execute cells directly.
What you will learn
What a run with no observability does (and does not) leave behind.
How a minimal
ObservabilityConfigcreates an organised per-run directory:config.yaml,summary.json,checkpoints/.How
structured_loggingstreams lifecycle events to the console, and howverbosity(0-3) changes what you see.How
log_to_filewrites a machine-readablelifecycle.jsonlyou can load into a DataFrame.The exact folder trees produced by the TensorBoard and MLflow experiment trackers.
Three ways to bring your own logger: a Lightning logger through
ObservabilityConfig.logger, a directfit(logger=...)hand-off, and an in-process lifecycle-event sink.A side-by-side comparison of every case so you can pick the right settings for your workflow.
Note
For a quick demonstration these tutorials train with very low max_epochs and patience (5 and 2). Treat these as placeholders and choose values that match your own compute budget and problem. As a starting point, at least max_epochs=100 and patience=10 are recommended for meaningful results.
Important
Structured logging relies on structlog, and the experiment trackers need their own packages.
Install the optional extras you intend to use:
pip install 'deeptab[logs]'for structured logging (structlog).pip install 'deeptab[tensorboard]'for the TensorBoard tracker.pip install 'deeptab[mlflow]'for the MLflow tracker.
Setup
DeepTab and Lightning print a few framework banners on every fit (a device summary, a
parameter-count table) that are useful in isolation but drown out the observability messages this
tutorial is about. Raising those loggers to ERROR keeps the output focused; DeepTab’s own
events are emitted separately and are unaffected.
import contextlib
import json
import os
import re
import shutil
import sys
from pathlib import Path
import pandas as pd
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from deeptab.configs import TrainerConfig
from deeptab.core.observability import ObservabilityConfig
from deeptab.models import MLPClassifier
Every run in this tutorial writes under a single scratch directory so the examples stay isolated and easy to clean up. We recreate it from scratch on each execution so the trees you see below are reproducible.
WORKDIR = Path("obs_runs").resolve()
if WORKDIR.exists():
shutil.rmtree(WORKDIR)
WORKDIR.mkdir(parents=True)
print("Scratch directory:", WORKDIR)
A small synthetic binary-classification dataset is all we need. Observability behaves identically for regressors and distributional (LSS) models.
X, y = make_classification(
n_samples=800, n_features=8, n_informative=6, n_classes=2, random_state=42
)
X = pd.DataFrame(X, columns=[f"feature_{i}" for i in range(8)])
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, stratify=y, random_state=42
)
A few small helpers
show_tree prints a directory as an indented tree so we can inspect what each run produced.
latest_run returns the newest per-run directory. focused_output hides DeepTab’s per-feature
preprocessing summary (a plain print from the preprocessing layer) so that, when we look at
structured logging, the cell output stays on the observability messages. None of these helpers are
required to use observability; they only keep this tutorial readable.
def show_tree(root, title=None):
"""Print *root* as an indented directory tree."""
root = os.path.abspath(root)
if title:
print(title)
if not os.path.exists(root):
print(" (nothing was created here)")
return
for dirpath, dirnames, filenames in os.walk(root):
dirnames.sort()
depth = dirpath[len(root):].count(os.sep)
print(" " * depth + os.path.basename(dirpath) + "/")
for name in sorted(filenames):
print(" " * (depth + 1) + name)
def latest_run(root_dir, experiment_name):
"""Return the newest per-run directory under <root_dir>/runs/<experiment_name>/."""
runs = Path(root_dir) / "runs" / experiment_name
return sorted(runs.iterdir())[-1]
_NOISE = re.compile(r"^(Numerical Feature:|Categorical Feature:|Embedding Feature:|-{5,}\s*$)")
class _LineFilter:
"""A thin stdout wrapper that drops the preprocessor's per-feature summary lines."""
def __init__(self, target):
self._target = target
self._buf = ""
def write(self, text):
self._buf += text
while "\n" in self._buf:
line, self._buf = self._buf.split("\n", 1)
if not _NOISE.match(line):
self._target.write(line + "\n")
def flush(self):
self._target.flush()
@contextlib.contextmanager
def focused_output():
real = sys.stdout
sys.stdout = _LineFilter(real)
try:
yield
finally:
sys.stdout = real
We reuse one tiny TrainerConfig and a single train helper everywhere. The only thing that
changes between sections is the observability_config we hand to the estimator.
TRAINER = TrainerConfig(max_epochs=5, patience=2, batch_size=128)
def train(observability_config=None, **fit_kwargs):
"""Fit a fresh MLPClassifier, optionally with observability attached."""
model = MLPClassifier(
trainer_config=TRAINER,
random_state=42,
observability_config=observability_config,
)
with focused_output():
model.fit(X_train, y_train, enable_progress_bar=False, **fit_kwargs)
return model
1. The baseline: no observability
Observability is entirely opt-in. An estimator created without an ObservabilityConfig trains
exactly as before and emits no events. There is no run directory, no config.yaml, and no event
log. This is why notebooks stay quiet by default.
The only artifact a plain fit() leaves behind is the Lightning checkpoint that restores the best
weights. We point its default_root_dir at our scratch folder so it does not clutter the working
directory.
baseline = train(default_root_dir=str(WORKDIR / "01_no_observability"))
print("Fitted:", type(baseline).__name__)
print("Test accuracy:", round((baseline.predict(X_test) == y_test).mean(), 3))
show_tree(WORKDIR / "01_no_observability", "01_no_observability/")
01_no_observability/
checkpoints/
best_model.ckpt
Only a checkpoints/ directory with the best-epoch weights. Nothing was logged, nothing was
tracked. If you already run your own logging stack, this is the mode to use: DeepTab stays out of
the way.
2. A minimal ObservabilityConfig
The moment you attach an ObservabilityConfig (even an empty one), DeepTab creates a single
organised directory for the run. Every output path is derived from root_dir. With nothing else
enabled you already get the run’s hyperparameters (config.yaml), its final metrics
(summary.json), and the best checkpoint, all under a timestamped run folder.
obs_min = ObservabilityConfig(
root_dir=str(WORKDIR / "02_minimal"),
experiment_name="demo",
)
model = train(obs_min)
show_tree(WORKDIR / "02_minimal", "02_minimal/")
02_minimal/
runs/
demo/
20260613_092809_712e4b18/
config.yaml
summary.json
artifacts/
checkpoints/
best_model.ckpt
The run directory name combines a timestamp and a short random id
(<YYYYMMDD_HHMMSS>_<run_id>), so concurrent or repeated runs never overwrite each other. Reading
the two metadata files it wrote:
run = latest_run(WORKDIR / "02_minimal", "demo")
print("=== config.yaml ===")
print((run / "config.yaml").read_text())
print("=== summary.json ===")
print((run / "summary.json").read_text())
=== summary.json ===
{
"run_id": "712e4b18",
"model_class": "MLPClassifier",
"n_params": 78273,
"n_samples": 640,
"best_val_loss": 0.6822827458381653,
"best_epoch": null,
"n_epochs_run": 5,
"duration_min": 0.0058
}
config.yaml is the full, reloadable configuration of the estimator (model, preprocessing, and
trainer configs plus the random state). summary.json is the compact result: parameter count,
best validation loss, best epoch, epochs actually run, and wall-clock duration. Together they make
every run self-describing.
3. Structured logging and verbosity
Set structured_logging=True to stream named lifecycle events. By default they go to the console
as compact, column-aligned lines prefixed with the run id. verbosity controls which events
you see; higher levels are supersets of lower ones:
Level |
Emits |
|---|---|
|
Silent. |
|
Milestones: |
|
Level 1 plus |
|
Debug: every event. |
Watch how the same run prints progressively more as we raise verbosity from 1 to 3.
for level in (1, 2, 3):
print(f"\n===================== verbosity = {level} =====================")
obs = ObservabilityConfig(
root_dir=str(WORKDIR / f"03_verbosity_{level}"),
experiment_name="demo",
structured_logging=True,
verbosity=level,
)
train(obs)
===================== verbosity = 1 =====================
2026-06-13 09:46:39 [info] run=f67d60c0 fit.started model=MLPClassifier samples=640 features=8 seed=42
2026-06-13 09:46:39 [info] run=f67d60c0 model.created backbone=MLP params=78_273 num=8 cat=0 duration_min=0.0000
2026-06-13 09:46:39 [info] run=f67d60c0 train.completed best_epoch=null best_val_loss=0.6823 epochs_run=5 duration_min=0.0061
2026-06-13 09:46:39 [info] run=f67d60c0 fit.completed status=success model=MLPClassifier params=78_273 best_val_loss=0.6823 duration_min=0.0069
===================== verbosity = 2 =====================
2026-06-13 09:46:39 [info] run=d5d96374 fit.started model=MLPClassifier samples=640 features=8 seed=42
2026-06-13 09:46:39 [info] run=d5d96374 data.created train=512 val=128 num=8 cat=0 val_size=0.2000 duration_min=0.0004
2026-06-13 09:46:39 [info] run=d5d96374 model.created backbone=MLP params=78_273 num=8 cat=0 duration_min=0.0000
2026-06-13 09:46:39 [info] run=d5d96374 train.started epochs=5 batch=128 lr=null optimizer=Adam patience=2 val_size=0.2000
2026-06-13 09:46:40 [info] run=d5d96374 train.completed best_epoch=null best_val_loss=0.6823 epochs_run=5 duration_min=0.0051
2026-06-13 09:46:40 [info] run=d5d96374 fit.completed status=success model=MLPClassifier params=78_273 best_val_loss=0.6823 duration_min=0.0057
Each event carries structured context: fit.started records the sample and feature counts,
model.created the backbone and parameter count, train.completed the best validation loss and
epoch, and fit.completed the total duration. verbosity=2 adds the data-split and
training-setup events; verbosity=3 would add any finer-grained events such as save/load and
predict.
Tip
verbosity=0 keeps the run directory and metadata files but emits nothing to the console: useful
for sweeps where you want artifacts on disk without log spam.
4. Persisting events to lifecycle.jsonl
Console output is convenient for a single run, but for sweeps you want machine-readable records.
Set log_to_file=True and DeepTab writes one JSON object per event to lifecycle.jsonl inside the
run directory. Here we also set log_to_console=False so this run writes only to the file.
obs_file = ObservabilityConfig(
root_dir=str(WORKDIR / "04_with_file"),
experiment_name="demo",
structured_logging=True,
log_to_console=False,
log_to_file=True,
verbosity=3,
)
train(obs_file)
show_tree(WORKDIR / "04_with_file", "04_with_file/")
04_with_file/
runs/
demo/
20260613_092810_058d84e7/
config.yaml
lifecycle.jsonl
summary.json
artifacts/
checkpoints/
best_model.ckpt
The run folder now also contains lifecycle.jsonl. Because every record is a flat JSON object,
you can load a run straight into a DataFrame:
run = latest_run(WORKDIR / "04_with_file", "demo")
events = [json.loads(line) for line in (run / "lifecycle.jsonl").read_text().splitlines()]
pd.DataFrame(events)[["timestamp", "event", "run_id"]]
timestamp event run_id
0 2026-06-13T09:46:40 fit.started 29bef1c6
1 2026-06-13T09:46:40 data.created 29bef1c6
2 2026-06-13T09:46:40 model.created 29bef1c6
3 2026-06-13T09:46:40 train.started 29bef1c6
4 2026-06-13T09:46:40 train.completed 29bef1c6
5 2026-06-13T09:46:40 fit.completed 29bef1c6
Every record is tagged with the same run_id, so you can concatenate lifecycle.jsonl files from
many runs and compare them programmatically: training duration per configuration, best validation
loss per seed, and so on.
5. What each setting controls
The runtime-logging fields combine independently. This table summarises their effect; the sections above and below show each one in action.
Field |
Default |
Effect |
|---|---|---|
|
|
Base of the whole output tree. Point it at a path your pipeline already archives. |
|
|
Groups related runs under |
|
|
Master switch for lifecycle event emission (needs |
|
|
Stream compact event lines to stdout (only when |
|
|
Write |
|
|
Which events are emitted: |
|
|
Activate Lightning trackers: |
|
|
A user-provided Lightning logger appended alongside the trackers. |
Note
The run directory (config.yaml, summary.json, checkpoints/) is created whenever any
ObservabilityConfig is attached, regardless of the logging flags. The flags only add console
output, the event file, and trackers.
6. Experiment trackers
experiment_trackers turns on Lightning loggers that record metrics during training. DeepTab
resolves all of their paths under root_dir by default, so a tracker adds a sibling folder next to
runs/ rather than scattering files across your project.
TensorBoard
obs_tb = ObservabilityConfig(
root_dir=str(WORKDIR / "06_tensorboard"),
experiment_name="demo",
experiment_trackers=["tensorboard"],
)
train(obs_tb)
show_tree(WORKDIR / "06_tensorboard", "06_tensorboard/")
06_tensorboard/
runs/
demo/
20260613_094640_70f476cd/
config.yaml
summary.json
artifacts/
checkpoints/
best_model.ckpt
tensorboard/
demo/
20260613_094640_70f476cd/
events.out.tfevents...
hparams.yaml
Alongside the usual runs/ tree you now get a tensorboard/<experiment_name>/<run>/ folder with
the event file and hparams.yaml. Point TensorBoard at the tensorboard/ directory to explore the
curves:
tensorboard --logdir obs_runs/06_tensorboard/tensorboard
MLflow
The MLflow tracker defaults to a self-contained local store: a SQLite backend plus a file-based
artifact directory, both under root_dir.
obs_mlflow = ObservabilityConfig(
root_dir=str(WORKDIR / "07_mlflow"),
experiment_name="demo",
experiment_trackers=["mlflow"],
mlflow_experiment_name="deeptab-demo",
)
train(obs_mlflow)
show_tree(WORKDIR / "07_mlflow", "07_mlflow/")
07_mlflow/
mlflow/
artifacts/
950a0173cd2d4f799fa3267b07e77bf3/
artifacts/
config.yaml
summary.json
best_model/
aliases.txt
best_model.ckpt
metadata.yaml
checkpoints/
best_model.ckpt
backend/
mlflow.db
runs/
demo/
20260613_094641_259bbfef/
config.yaml
summary.json
artifacts/
checkpoints/
best_model.ckpt
MLflow stores run metadata in mlflow/backend/mlflow.db and uploads the run’s config.yaml,
summary.json, and the best checkpoint into mlflow/artifacts/<mlflow_run_id>/. DeepTab also logs
the flattened hyperparameters, dataset statistics, and final metrics to the MLflow run. Launch the
UI against the same SQLite file:
mlflow ui --backend-store-uri sqlite:///obs_runs/07_mlflow/mlflow/backend/mlflow.db
Set both trackers at once with experiment_trackers=["tensorboard", "mlflow"] to get both trees
from a single run.
7. Bring your own logger
If you already have a logging or experiment-tracking stack, DeepTab can hand off to it instead of (or alongside) its built-in trackers. There are three integration points, from most to least integrated.
7a. A Lightning logger through ObservabilityConfig.logger
Because DeepTab trains through PyTorch Lightning, any Lightning logger works. Pass an instance via
the logger field and DeepTab appends it to the trainer’s logger list. We use CSVLogger here
because it writes a folder you can see; the same pattern applies to WandbLogger, CometLogger,
NeptuneLogger, and friends.
from lightning.pytorch.loggers import CSVLogger
obs_byo = ObservabilityConfig(
root_dir=str(WORKDIR / "08_byo_logger"),
experiment_name="demo",
experiment_trackers=["tensorboard"], # see the note below: at least one tracker is required
logger=CSVLogger(save_dir=str(WORKDIR / "08_byo_logger" / "csv"), name="mlp"),
)
train(obs_byo)
show_tree(WORKDIR / "08_byo_logger", "08_byo_logger/")
08_byo_logger/
csv/
mlp/
version_0/
hparams.yaml
metrics.csv
runs/
demo/
20260613_094641_.../
config.yaml
summary.json
artifacts/
checkpoints/best_model.ckpt
tensorboard/
demo/
20260613_094641_.../
events.out.tfevents...
hparams.yaml
Your CSVLogger wrote csv/mlp/version_0/ (with metrics.csv and hparams.yaml) right next to
DeepTab’s own runs/ and tensorboard/ trees. A real tracker such as
WandbLogger(project="churn") would instead stream to your hosted dashboard while DeepTab keeps
owning the per-run artifact directory.
Important
The logger field is honoured only when experiment_trackers is non-empty. With an empty
experiment_trackers list DeepTab suppresses Lightning’s logger entirely (to avoid a stray
lightning_logs/ folder), and a logger you passed would be silently ignored. Pair your logger
with at least one tracker, or use the direct hand-off below.
To prove the point, here is the same custom logger with no tracker. Notice the run directory is
still created, but there is no csv/ folder: the logger was not attached.
obs_logger_only = ObservabilityConfig(
root_dir=str(WORKDIR / "09_logger_only"),
experiment_name="demo",
logger=CSVLogger(save_dir=str(WORKDIR / "09_logger_only" / "csv"), name="mlp"),
)
train(obs_logger_only)
show_tree(WORKDIR / "09_logger_only", "09_logger_only/ (no csv/, logger was ignored without a tracker)")
09_logger_only/ (no csv/, logger was ignored without a tracker)
runs/
demo/
20260613_094641_.../
config.yaml
summary.json
artifacts/
checkpoints/best_model.ckpt
7b. Hand a logger straight to fit()
Any keyword argument fit() does not recognise is forwarded to pl.Trainer, and an explicit
logger= overrides DeepTab’s default. This is the lightest-weight option: no ObservabilityConfig
at all, just your logger driving training. There is no DeepTab run directory in this mode, only
whatever your logger writes.
direct = MLPClassifier(trainer_config=TRAINER, random_state=42)
with focused_output():
direct.fit(
X_train, y_train,
enable_progress_bar=False,
logger=CSVLogger(save_dir=str(WORKDIR / "10_direct_logger"), name="mlp"),
)
show_tree(WORKDIR / "10_direct_logger", "10_direct_logger/")
10_direct_logger/
mlp/
version_0/
hparams.yaml
metrics.csv
7c. Consume the lifecycle events in-process
If you want DeepTab’s events (not just Lightning metrics) routed into your own system, attach
any object that exposes info(event: str, **kwargs). DeepTab dispatches every lifecycle event to
it. This is the same interface the built-in structlog backend implements, so a test double or an
adapter to your telemetry pipeline drops in cleanly.
Note
This attaches to the _event_logger hook directly, which is a lower-level integration point than
the ObservabilityConfig fields above. Use it when you need the structured events inside your own
process; use log_to_file=True and read lifecycle.jsonl when a file-based hand-off is enough.
class CollectingSink:
"""Minimal event sink: captures every lifecycle event in memory."""
def __init__(self):
self.events = []
def info(self, event, **kwargs):
self.events.append({"event": event, **kwargs})
sink = CollectingSink()
model = MLPClassifier(trainer_config=TRAINER, random_state=42)
model._event_logger = sink # attach a custom in-process event consumer
with focused_output():
model.fit(X_train, y_train, enable_progress_bar=False, default_root_dir=str(WORKDIR / "11_custom_sink"))
print("Captured events:")
for record in sink.events:
print(" ", record["event"], "->", {k: v for k, v in record.items() if k != "event"})
Captured events:
fit.started -> {'run_id': '0f1c8c6a', 'model_class': 'MLPClassifier', 'n_samples': 640, 'n_features': 8, 'random_state': 42}
data.created -> {'run_id': '0f1c8c6a', 'n_train': 512, 'n_val': 128, 'n_num_features': 8, 'n_cat_features': 0, 'val_size': 0.2, 'duration_min': 0.0004}
model.created -> {'run_id': '0f1c8c6a', 'backbone': 'MLP', 'n_params': 78273, 'n_num_features': 8, 'n_cat_features': 0, 'duration_min': 0.0}
train.started -> {'run_id': '0f1c8c6a', 'max_epochs': 5, 'batch_size': 128, 'lr': None, 'optimizer': 'Adam', 'patience': 2, 'val_size': 0.2}
train.completed -> {'run_id': '0f1c8c6a', 'best_epoch': None, 'best_val_loss': 0.6822827458381653, 'n_epochs_run': 5, 'duration_min': 0.0051}
fit.completed -> {'run_id': '0f1c8c6a', 'status': 'success', 'model_class': 'MLPClassifier', 'n_params': 78273, 'best_val_loss': 0.6822827458381653, 'duration_min': 0.0056}
Your sink received the full event stream with its structured payloads, ready to forward to whatever
backend you use. Because no ObservabilityConfig was attached, DeepTab created no run directory of
its own: your code is in full control.
8. Side-by-side: what each configuration leaves on disk
The trees below are the canonical shapes you can expect. Timestamps and ids vary per run; the structure does not.
No observability: only the best-weights checkpoint:
01_no_observability/
checkpoints/
best_model.ckpt
Minimal ObservabilityConfig: self-describing run directory:
02_minimal/
runs/demo/<timestamp>_<run_id>/
config.yaml # full estimator configuration
summary.json # final metrics
artifacts/ # reserved for run artifacts
checkpoints/
best_model.ckpt
structured_logging=True, log_to_file=True: adds the event log:
04_with_file/
runs/demo/<timestamp>_<run_id>/
config.yaml
lifecycle.jsonl # one JSON event per line
summary.json
artifacts/
checkpoints/
best_model.ckpt
experiment_trackers=["tensorboard"]: adds a TensorBoard tree:
06_tensorboard/
runs/demo/<timestamp>_<run_id>/
config.yaml
summary.json
artifacts/
checkpoints/best_model.ckpt
tensorboard/demo/<timestamp>_<run_id>/
events.out.tfevents...
hparams.yaml
experiment_trackers=["mlflow"]: adds a local MLflow store:
07_mlflow/
runs/demo/<timestamp>_<run_id>/
config.yaml
summary.json
artifacts/
checkpoints/best_model.ckpt
mlflow/
backend/mlflow.db # run metadata (SQLite)
artifacts/<mlflow_run_id>/artifacts/
config.yaml
summary.json
best_model/... # logged model checkpoint
checkpoints/best_model.ckpt
logger=... + a tracker: your Lightning logger sits beside DeepTab’s trees:
08_byo_logger/
csv/mlp/version_0/
hparams.yaml
metrics.csv
runs/demo/<timestamp>_<run_id>/...
tensorboard/demo/<timestamp>_<run_id>/...
When to use which
Quick experiments / notebooks: no observability, or
verbosity=1for a few milestone lines.Reproducible runs you may revisit: minimal
ObservabilityConfigso every run keeps itsconfig.yamlandsummary.json.Sweeps and comparisons:
structured_logging=True, log_to_file=True, verbosity=2, then load eachlifecycle.jsonlinto a DataFrame.Dashboards: add
experiment_trackers=["tensorboard"]or["mlflow"].Existing stack: pass your Lightning logger via
logger=(with a tracker), hand it tofit(logger=...), or attach an in-process event sink.
Cleanup
The scratch directory is disposable. Remove it so re-running the examples starts clean (it is also git-ignored).
shutil.rmtree(WORKDIR, ignore_errors=True)
print("Removed", WORKDIR)
Next steps
Observability (core concept): the configuration reference and design notes.
Advanced training: optimizers, schedulers, callbacks, and
InferenceModelin production.Hyperparameter optimization: run sweeps whose results you can track with the tools above.