Soft Brownian Offset¶
Introduction¶
Soft Brownian Offset (SBO) defines an iterative approach to translate points by a most likely distance from a given dataset. It can be used for generating out-of-distribution (OOD) samples. It is based on Gaussian Hyperspheric Offset (GHO), which is also included in this package (see below).
Installation¶
This project is hosted on PyPI and can therefore be installed easily through pip
:
pip install sbo
Dependending on your setup you may need to add --user
after the install.
Usage¶
For brevity’s sake here’s a short introduction to the library’s usage:
1 2 3 4 5 | from sklearn.datasets import make_moons
from sbo import soft_brownian_offset
X, _ = make_moons(n_samples=60, noise=.08)
X_ood = soft_brownian_offset(X, d_min=.35, d_off=.24, n_samples=120, softness=0)
|
Background¶
The technique allows for trivial OOD generation – as shown above – or more complex schemes that apply the transformation of learned representations. For an in-depth look at the latter please refer to the paper that is available as open access from the CVF. For citations please see cite.
Parameter overview¶
The following plot gives an overview of possible choices for d_min
(\(d^-\)), d_off
(\(d^+\)) and softness
(\(\sigma\)):
It was created using the following Python code:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 | #!/usr/bin/env python3
# Creates a plot for Soft Brownian Offset (SBO)
import numpy as np
import pylab as plt
import itertools
import sys
from matplotlib import cm
from sklearn.datasets import make_moons
from sbo import soft_brownian_offset
plt.rc('text', usetex=True)
c = cm.tab10.colors
def plot_data(X, y, ax=plt):
ax.scatter(X[:, 0], X[:, 1], marker='x', s=20, label='ID', alpha=alpha, c=[c[-1]])
ax.scatter(y[:, 0], y[:, 1], marker='+', label='SBO', alpha=alpha, c=[c[-6]])
def plot_mindist(X, y, ax=plt):
if len(X.shape) == 1:
X = X[:, None]
if len(y.shape) == 1:
y = y[:, None]
ax.hist(pairwise_distances(y, X).min(axis=1), bins=len(y) // 10)
ax.set_xlabel("Minimum distance from ood to id")
ax.set_ylabel("Count")
def plot_data_mindist(X, y):
fig, ax = plt.subplots(1, 2)
plot_data(X, y, ax=ax[0])
plot_mindist(X, y, ax=ax[1])
plt.show()
n_samples_id = 60
n_samples_ood = 150
noise = .08
show_progress = False
alpha = .6
n_colrow = 3
d_min = np.linspace(.25, .45, n_colrow)
softness = np.linspace(0, 1, n_colrow)
fig, ax = plt.subplots(n_colrow, n_colrow, sharex=True, sharey=True, figsize=(8.5, 9))
X, _ = make_moons(n_samples=n_samples_id, noise=noise)
for i, (d_min_, softness_) in enumerate(itertools.product(d_min, softness)):
xy = i // n_colrow, i % n_colrow
d_off_ = d_min_ * .7
ax[xy].set_title(f"$d^- = {d_min_:.2f}\ d^+ = {d_off_:.2f}\ \sigma = {softness_}$")
if softness_ == 0:
softness_ = False
y = soft_brownian_offset(X, d_min_, d_off_, n_samples=n_samples_ood, softness=softness_, show_progress=show_progress)
plot_data(X, y, ax=ax[xy])
if i // n_colrow == len(d_min) - 1:
ax[xy].set_xlabel("$x_1$")
if i % n_colrow == 0:
ax[xy].set_ylabel("$x_2$")
ax[0, n_colrow - 1].legend(loc='upper right')
plt.tight_layout()
plt.show()
|
Gaussian Hyperspheric Offset¶
GHO is the basis for SBO and assumes \(\pmb{X}\sim\mathcal{N}\). The following code’s result displays the shortcomings if the assumption does not hold:
1 2 3 4 5 | from sklearn.datasets import make_moons
from sbo import soft_brownian_offset, gaussian_hyperspheric_offset
X, _ = make_moons(n_samples=60, noise=.08)
X_ood = (gaussian_hyperspheric_offset(n_samples=220, mu=2, std=.3, n_dim=X.ndim) + X.mean()) * X.std()
|
Cite¶
Please cite SBO in your paper if it helps your research:
@inproceedings{MBH21,
author = {Möller, Felix and Botache, Diego and Huseljic, Denis and Heidecker, Florian and Bieshaar, Maarten and Sick, Bernhard},
booktitle = {{Proc. of CVPR SAIAD Workshop}},
title = {{Out-of-distribution Detection and Generation using Soft Brownian Offset Sampling and Autoencoders}},
year = 2021
}