Parsing Histograms

This page demonstrates the parsing process for histogram events.

Preparing Sample Event Logs

First, let’s import some libraries and prepare the environment for our sample event logs:

>>> import os
>>> import tempfile
>>> import numpy as np
>>> # Define some constants
>>> RND_STATE = 1234
>>> N_EVENTS = 10
>>> N_PARTICLES = 1000
>>> MU = 0
>>> SIGMA = 2
>>> # Prepare temp dirs for storing event files
>>> tmpdirs = {}

Before parsing a event file, we need to generate it first. The sample event files are generated by three commonly used event log writers.

We can generate the events by PyTorch:

>>> tmpdirs['torch'] = tempfile.TemporaryDirectory()
>>> from torch.utils.tensorboard import SummaryWriter
>>> log_dir = tmpdirs['torch'].name
>>> writer = SummaryWriter(log_dir)
>>> rng = np.random.RandomState(RND_STATE)
>>> for i in range(N_EVENTS):
...   x = rng.normal(MU, SIGMA, size=N_PARTICLES)
...   writer.add_histogram('dist', x + i, i)
>>> writer.close()

and quickly check the results:

>>> from tbparse import SummaryReader
>>> SummaryReader(log_dir, pivot=True).histograms.columns
Index(['step', 'dist/counts', 'dist/limits'], dtype='object')

The event logs can be easily read in 2 lines of code as shown above (1 for importing tbparse, 1 for reading the events).

Parsing Event Logs

In different use cases, we will want to read the event logs in different styles. We further show different configurations of the tbparse.SummaryReader class.

Load Event File / Run Directory

>>> from tbparse import SummaryReader
>>> log_dir = tmpdirs['torch'].name
>>> # Long Format
>>> df = SummaryReader(log_dir).histograms
>>> df.columns
Index(['step', 'tag', 'counts', 'limits'], dtype='object')
>>> # Wide Format
>>> df = SummaryReader(log_dir, pivot=True).histograms
>>> df.columns
Index(['step', 'dist/counts', 'dist/limits'], dtype='object')

Warning

When accessing SummaryReader.histograms, the events stored in each event file are collected internally. The best practice is to store the returned results in a DataFrame as shown in the samples, instead of repeatedly accessing SummaryReader.histograms.

Extra Columns

See the Extra Columns page for more details.

Plotting Events

We further demonstrate some basic filtering techniques for plotting our data.

Plotting a Distribution

The data from tensorboard event logs:

import matplotlib.pyplot as plt
from tbparse import SummaryReader

log_dir = tmpdirs['torch'].name
reader = SummaryReader(log_dir, pivot=True)
df = reader.histograms
df.set_index('step', inplace=True)
counts0 = df.at[0, 'dist/counts']
limits0 = df.at[0, 'dist/limits']
# draw PDF
x = np.linspace(limits0[0], limits0[-1], 11)
x, y = SummaryReader.histogram_to_pdf(counts0, limits0, x)
plt.plot(x, y)
# draw CDF
x = np.linspace(limits0[0], limits0[-1], 1000)
y = SummaryReader.histogram_to_cdf(counts0, limits0, x)
plt.plot(x, y)
plt.title('dist0')
plt.show()
../_images/parsing-histograms-5.png

Reference: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.rv_histogram.html

Plotting Multiple (Stacked) Distributions

import seaborn as sns
import matplotlib.pyplot as plt
log_dir = tmpdirs['torch'].name
reader = SummaryReader(log_dir, pivot=True)
df = reader.histograms
# Set background
sns.set_theme(style="white", rc={"axes.facecolor": (0, 0, 0, 0)})
# Choose color palettes for the distributions
pal = sns.color_palette("Oranges", 20)[5:-5]
# Initialize the FacetGrid object (stacking multiple plots)
g = sns.FacetGrid(df, row='step', hue='step', aspect=15, height=.4, palette=pal)

def plot_subplots(x, color, label, data):
  ax = plt.gca()
  ax.text(0, .08, label, fontweight="bold", color=color,
          ha="left", va="center", transform=ax.transAxes)
  counts = data['dist/counts'].iloc[0]
  limits = data['dist/limits'].iloc[0]
  x = np.linspace(limits[0], limits[-1], 15)
  x, y = SummaryReader.histogram_to_pdf(counts, limits, x)
  # Draw the densities in a few steps
  sns.lineplot(x=x, y=y, clip_on=False, color="w", lw=2)
  ax.fill_between(x, y, color=color)
# Plot each subplots with df[df['step']==i]
g.map_dataframe(plot_subplots, None)

# Add a bottom line for each subplot
# passing color=None to refline() uses the hue mapping
g.refline(y=0, linewidth=2, linestyle="-", color=None, clip_on=False)
# Set the subplots to overlap (i.e., height of each distribution)
g.figure.subplots_adjust(hspace=-.9)
# Remove axes details that don't play well with overlap
g.set_titles("")
g.set(yticks=[], xlabel="", ylabel="")
g.despine(bottom=True, left=True)
../_images/parsing-histograms-9.png

Reference: https://seaborn.pydata.org/examples/kde_ridgeplot.html

Plotting Multiple (Stacked) Histograms

import seaborn as sns
import matplotlib.pyplot as plt
log_dir = tmpdirs['torch'].name
reader = SummaryReader(log_dir, pivot=True)
df = reader.histograms
# Set background
sns.set_theme(style="white", rc={"axes.facecolor": (0, 0, 0, 0)})
# Choose color palettes for the distributions
pal = sns.color_palette("Oranges", 20)[5:-5]
# Initialize the FacetGrid object (stacking multiple plots)
g = sns.FacetGrid(df, row='step', hue='step', aspect=15, height=.4, palette=pal)

def plot_subplots(x, color, label, data):
  ax = plt.gca()
  ax.text(0, .08, label, fontweight="bold", color=color,
          ha="left", va="center", transform=ax.transAxes)
  counts = data['dist/counts'].iloc[0]
  limits = data['dist/limits'].iloc[0]
  x, y = SummaryReader.histogram_to_bins(counts, limits, limits[0], limits[-1], 15)
  # Draw the densities in a few steps
  sns.lineplot(x=x, y=y, clip_on=False, color="w", lw=2)
  ax.fill_between(x, y, color=color)
# Plot each subplots with df[df['step']==i]
g.map_dataframe(plot_subplots, None)

# Add a bottom line for each subplot
# passing color=None to refline() uses the hue mapping
g.refline(y=0, linewidth=2, linestyle="-", color=None, clip_on=False)
# Set the subplots to overlap (i.e., height of each distribution)
g.figure.subplots_adjust(hspace=-.9)
# Remove axes details that don't play well with overlap
g.set_titles("")
g.set(yticks=[], xlabel="", ylabel="")
g.despine(bottom=True, left=True)
../_images/parsing-histograms-15.png

SummaryReader.histogram_to_bins aims to reproduce the visualization in tensorboard dashboard.

Reference: https://github.com/tensorflow/tensorboard/blob/master/tensorboard/plugins/histogram/tf_histogram_dashboard/histogramCore.ts#L83