Prediction Games

In a nutshell, the what, why, and how behind Prediction Games (PGs):

What? PGs research investigates systems that learn a growing network of hierarchical concepts (patterns, such as probabilistic finite-state machines) in an unsupervised cumulative fashion.
Why? (why learn higher-level concepts?) Beginning at a low level 'starter kit' (eg pixels or basic shapes in the visual modality, or more generally some basic finite set of patterns) and building and adapting one's own higher level useful combinations, depending on what one encounters in one's life, is often more flexible and powerful than preprogramming the higher levels: it is impossible to anticipate what will be experienced.
How? A few current salient ingredients of the solution approach are: (1) Improving at prediction is the major unifying goal, and (1.1) concepts predict one another, serving as both predictors and predictands in the system (an attractive symmetry); (2) concepts are made up of one another (legos); (3) the system learns by repeatedly interpreting (interpretation is imposing or projecting one's concepts onto the lowest level input) (4) a systems approach, meaning several learning/inference processes working together (not a single algorithm!).

The techniques have become increasingly probabilistic and information theoretic in recent years. My goal is to show feasibility and utility of the approach. A source of power but also a challenge is that such systems consume what they produce (i.e., concepts). Below, I motivate the PGs approach from a cognitive and developmental point of view (under the constraints of computational and learning efficiency).

Background Motivation. Human intelligence appears to require many concepts (that work well enough together), for instance to achieve daily 'common sensical' behavior. There are two distinct but inter-related problems here, in the ways I have understood and approached the issues:

[Time snapshot] How does one, efficiently and adequately, figure out (mostly unconsciously) which of one's many concepts are useful at a given situation (for instance when looking at a picture)?¹
[Historical/Developmental] Where do these many concepts come from, in the first place? (and where are they going?!)

A few other ensuing or related questions include: what is a concept? (answer: a recurring/reusable structure.. at least for the perceptual stages.. see below!) and, for me, how can machine learning (ml), and more broadly, computational thinking, help? (eg with the wealth of learning and inference techniques and theories that continue to be developed)

Cognitive scientists tell us that most concepts (such as: water, chair, a house, my house, mind, mad, ...) develop over time, in a sequential and cumulative manner. This development appears to be largely unsupervised, ie no explicit teacher. By the time one shows signs of learning a language, if concepts are to be useful, many such must have already been developed to some extent, as the child already knows and can do a lot!² In the mid 2000s, I proposed Prediction Games to develop and study such learning systems.³ In PGs, a system composed of multiple learning and inferencing parts, given its input stream (broken into episodes), plays the game of prediction on it. The system tries to get better at the prediction task over time: predicting more extensively, into the future or into space, with possibly less (becoming a faster and more powerful predictor). This has plausible survival advantage! Thus, prediction, of one's world, could be a unifying task: could it be sufficient for providing the feedback needed to achieve the conceptual complexity of humans (assuming learning is the main vehicle of development)? But of course, one needs much more details. In order to get better at prediction, the PGs system keeps expanding its hierarchical networked vocabulary of concepts. In PGs, concepts are both the predictors and the predictands (ie the targets of prediction). This symmetry is a major draw of this approach to me.⁴ Furthermore, concepts not only predict one another but are also built out of one another, akin to Lego pieces. This is the cumulative (constructivist!) part of the approach. To start the whole process of learning, the system is given an initial set of primitive (innate/hardwired) concepts (an alphabet, a finite discrete set), with the capability to break its raw sensations in an episode into those primitives. So each episode begins with a sequence (or string) of primitives in the input buffer. In order to predict better, the system separates and puts together its buffer contents (segments) and maps the chunks into its current most useful (highest level) concepts. I have termed this process (of structuring ones input with one's concepts) interpretation (prediction and interpretation are intertwined). By practicing many interpretations, over many episodes, the system figures out which concepts (new and old) go together: predict each other, and could perhaps be joined to make larger more useful (higher level) concepts. Thus concepts correspond to hierarchical structured patterns (such as finite state machines), and PGs involve a continual self-supervised and cumulative online learning process for learning more and more concepts and their evolving (prediction) relations.

I currently believe that my research on PGs is most relevant to learning in perception. Over the years, I have built a few versions of such systems that play the game in one dimensional text (see the pointers below). There are many challenges: How do we avoid combinatorial explosion (because such systems work with explicit structures)? These systems make local decisions, eg in determining which concepts to activate in a given episode, or which concepts to join to make composite concepts, based on noisy indirect information. There is much uncertainty, in particular uncertainty on top of uncertainty (reminiscent of a house of cards!): can the system build a robust networked hierarchy of concepts (without the errors in earlier decisions compounding)? How do we design algorithms and objectives so that learning can go on and not get stuck in poor local optima? When or why should such systems succeed over time? Issues of (code/engineering) complexity and how to control and understand the dynamics (of subsystems interactings): for instance, how biased will such a system, which is learning sequentially, be? (what it learns now affects its future choices and its future learning) There are many problems and subproblems to be discovered and defined here. Along the way, problems of philosophical nature arise too, as the system is building its own (biased) reality in a sense (so then, is there 'one unique objective truth out there' or are we closer to idealism?) (relevant philosophies: phenomenology of perception, constructivism in epistemology, semiotics, relevance realization, ...). I think this has made for a good long-term research project!

Our (human) thoughts are built on concepts (and embodied!). A few references:

The Big Book of Concepts. Gregory L. Murphy, MIT press, 2002.
Philosophy in the Flesh: the Embodied Mind and its Challenge to Western Thought. George Lakeoff and Mark Johnson, Basic Books, 1999.
Surfaces and Essences: Analogy as the Fuel and Fire of Thinking. Douglas R. Hofstadter and Emmanuel Sander, Basic Books, 2013

[More material (pointer and papers) to go here. ]

¹Much of this may apply to other organisms too. An episode can be seeing an image or hearing an utterance. The word 'useful' is probably better than other choices such as 'present' (I think I also prefer it over 'relevant'). See also the philosophical problem of 'relevance realization' by John Vervaeke. ↩
²Including more or less figuring out, or having developed the appropriate biases for, the very challenging task of what the uttered noisy words of a parent, heard in speech, could refer to! (reference also implies a shared, or some good intersection in, the conceptual spaces) ↩
³ This is after a few years of gaining experience in ml, but also increasingly thinking about how we can make learning systems less 'needy' (such as avoiding expensive manual labeling). I also saw how large amounts of (unlabeled) data could nevertheless help provide the needed feedback (while at Yahoo!, through the works of colleagues). ↩
⁴ In most ml work, features/predictors and the targets of prediction are distinct. ↩

Back to Omid Madani's homepage
Back to publications page

Prediction Games (PGs)

On the PGs approach/systems:

Selected papers on the prediction sub-problem (online and open-ended, non-stationary, ..):