Warning: The ideas discussed here are half-baked, and I do not understand them fully

In this post, I want to elaborate on the view that the brain builds a model of the world. Not a sensory-motor model, even tough senses and motor commands are definitely part of the model; but a model of the structure of the world. And probably isomorphic to it.

Next, I want to emphasize that structure is important, not labels. Concepts that you truly understand, you can regenerate. Naming things is just a matter of linking the concept to a label (the linking is similar to the fine-tuning described here).

To take a concrete example: consider this expression: “Painting yourself in a corner”. For a long time, I thought of it as “painting a self-portrait on the wall, near a corner of the room”. That doesn’t make too much sense, but it didn’t bother me, I just cached the meaning and went along with it.

It turns out, viewing the expression as “spraying paint everywhere on the wall while standing in a corner” makes a lot more sense. But to see why, you need knowledge about

  • Rooms
  • 2d space
  • room geometry:
    • what is a corner
    • how a room has walls and you cannot go through them
  • paint physics:
    • fresh paint
    • how paint takes a while to dry
    • how you cannot walk on fresh paint (or at least bad things will happen if you do)
  • painting logic:
    • how when you paint you usually spray the whole surface
  • body topology
    • how when you paint you have a limited reach

With all that in mind, you can view how the correct representation may be computable (ie, how an algorithm or computational agent could assemble it). With the correct representation, the “essence” of the expression: “getting yourself stuck”, appears somewhat magically.

I think a good question to ask now is this: what kind of knowledge representation would allow you to so with so much ease:

  • Switch from one representation to the other (auto-portrait versus spraying the floor)
  • Pick just the right list of relevant facts to extract the “essence”
  • Scan possible “parses” of “interpretations” and evaluate them (spraying the floor being much better than auto-portrait)

(Another related question is : is that the right order of the steps? Can the “essence extraction” of being stuck happen before finding the right representation of spraying the floor? I do not think so, but it seems somewhat plausible.)

And to make things worse: finding the “right representation” is based on a lot of physics and logical knowledge. It is not just an association of concepts, but a complicated clockwork with moving parts: if you could fly, if there is a door behind you, if you had no problem standing here a few days, or if you painted in strips, then the expression would lose its meaning. What’s more, we can easily see what parameters make the situation better or worse: the larger the surface painted, the slower the paint dries, or the more expensive your shoes are; the worse the situation gets.

What kind of data structure of algorithm has any chance of achieving that ? Currently, I see none.

But let’s try a wild guess.

Let’s say we have a slow and conscious reasoning process, with a very limited working memory (say, 10s of bits). The slow reasoning process then queries a few generative black boxes, and their reversed versions. Generative black boxes may be a physics engine, a psychological engine, or a graphics engine. A query may be: what picture do I get if I initialize this scene in my graphics engine with two lights instead of one, with supplied parameters ? Or a much dimmer light ? And the query to the inverse might be: where is the source of light in this image ?

The querying ideas come from this paper: On Learning to Think: Algorithmic Information Theory for Novel Combinations of Reinforcement Learning Controllers and Recurrent Neural World Models

In this view:

  • Reasoning is a sequence of queries to the many black boxes and their inverses. Progress is made by inspecting the result of each query, drawing conclusions, and querying further.
  • Understanding is finding a representation: a set of parameters to fit in a given model. (spraying the floor or auto-portrait)
  • An analogy is the common part between two representations

About the details of how the querying is done, how structure is represented, how information is shared between the reasoning process and the models; I have no idea.