Generative black boxes
In this post, I want to explore the idea of querying a generative black box as a way to get infinite labelled training data.
A generative black box is any program P
that can generate a data vector x
given a description vector (or code vector) y
.
It could be for example a 3d graphics engine. In that case, y
would be a scene description: lights, objects in the scene,
position of the camera; and x
would be the generated image.
We could consider P
(in our example, the graphics program) as an input to a learning procedure.
This type of setup is similar to a Helmholtz Machine (HM) and Analysis By Synthesis (ABS). For a good introduction, I recommend:
P
can be turned into an infinite labelled training set by taking any valid y(i)
, feeding it into P
and getting x(i)
,
thus creating labelled set of (y
s, x
s).
One obvious immediate question is how to get valid, (and more importantly: plausible and meaningful) y
s. Not all scene descriptions
are equally plausible. Some might be physically impossible, or nonsensical.
The breeder learning algorithm
takes the following approach: We need an extra set X
of n data vectors {x1, x2, ... xn}
. We also have a recognition
network Rw
(eg: a feedforward neural network). The idea is to recognize the extra x
s with Rw
to get sensible y
s. Then
we can play around the those y
s. In the paper, “playing around” consists of:
perturbing the y
s to get y'i
, regenerating from them to get x'i
, re-recognizing to get
y''
, and learning with the couple (x'
, y'
).
This approach assumes that a unlabelled data-set X
is available. What could we do without such a data-set ? Is is possible
to learn a model of P
in an exploratory manner. Probably. I think we can turn to Schmidhuber
format theory of fun and creativity for that.