[under construction] Human intelligence appears to require
many concepts (that work well enough together), for instance
to achieve daily 'common sensical' behavior. There are two
distinct but inter-related problems here, in the ways I
understrood and have approached the issues:
- [Time snapshot] How does one (a person), efficiently and adequately,
figure out (mostly unconsciously) which of one's many concepts
are useful at any time point (for instance when looking at a picture)?
- [History/Development] Where do these many concepts
come from, in the first place? (and where are they going?!)
A few other ensuing or related questions include: what
is a concept? and, for me, how can machine learning (ml), and
more broadly, computational thinking, help? (eg with the
wealth of learning and inference techniques that continue to
be developed)
Cognitive scientists tell us that most concepts (such as: water, chair,
my friend's voice, a house, my house, democracy, ...) develop
over time, in a sequential and cumulative
manner. This development appears to be largely unsuperivsed,
ie no explicit teacher. By the time one shows signs of
learning a language, if concepts are to be useful, many such
must have already been developed (somewhat), as the child
already can do a lot in her/his world. In the 2000s, I proposed
Prediction Games (PGs) to develop and study such
systems. In PGs, a system composed of
multiple learning and inferencing parts, given its input stream
(broken into episodes), plays the game of prediction
on it. The system tries to get better at the prediction task
over time: predicting more extensively, into the future or into
space, with possibly less (become faster and more
powerful). This has plausible survival advantage! Thus,
prediction, of one's world, is a unifying task: could it be
sufficient for providing the feedback needed to achieve the
conceptual complexity of humans (assuming learning is the main
vehicle of development)? But of course, we need more details.
In order to get better at prediction, the PGs system keeps
expanding its hierarchical networked vocabulary of
concepts. In PGs, concepts are both the predictors and
the predictands (ie the targets of prediction). This
symmetry is a major draw of this approach to
me.
Furthermore, concepts not only predict one another but are also
built out of one another, akin to Lego pieces. This is the
cumulative (constructivist!) part of the approach. To start
the whole process of learning, the system is given an initial
set of primitive (innate/hardwired) concepts, with the
capability to break its raw sensed input in an episode into
those primitives. So each episode begins with a sequence of
primitives in the input buffer. In order to predict with
concepts, the system needs to break or map its buffer contents
into its current most useful concepts. This process is
called interpretation (prediction and interpretation
are intertwined). By practicing many interpretation episodes,
the system (continually) figures out which concepts (new and
old) go together: predict each other, and could perhaps be
joined to make larger more useful concepts. PGs involve a
continual self-supervised and cumulative online learning
process.
I currently believe that
the
research is most relevant to learning in perception. Over the
years, I have built a few versions of such systems that play
the game in one dimensional text (see the pointers below).
There are many challenges: How do we avoid combinatorial
explosion (because such systems work with explicit
structures)? These systems make local decisions, eg in
determnining which concepts to activate in a given episode, or
which concepts to join to make composite concepts, based on
noisy indirect information. There is much uncertainty (hence
the picture above of a house of cards!): can the system build
a robust networked hierarchy of concepts? How do we design
algorithms and objectives so that learning can go on and not
get stuck in poor local optima? When or why should such
systems succeed over time? Questions of (code/engineering)
complexity and how to control and understand the dynamics (of
subsystems interactings): for instance, how biased will such a
system, which is learning sequentially, be? (what it learns now
affects its future choices and learning) There are many problems and
subproblems to be discovered and defined here. Along the way,
problems of philosophical nature arise too, as the system is
building its own (biased) reality in a sense (so then, is
there 'one unique objective truth out there'?). I think this
makes for a good project and research area!
Our (human) thoughts are built on concepts (and embodied!). A few
references:
- The Big Book of Concepts. Gregory L. Murphy, MIT press, 2002.
- Philosophy in the Flesh: the Embodied Mind and
its Challenge to Western Thought. George Lakeoff and Mark Johnson, Basic Books, 1999.
- Surfaces and Essences: Analogy as the Fuel and Fire of Thinking. Douglas R. Hofstadter and Emmanuel Sander, Basic Books, 2013
[More material (pointer and papers) to go here. ]
My work on PGs began with the problems of learning under large
and growing ouput spaces (classes, concepts, or just items!),
and that line of work continues to advance (those ideas led to
PGs, and now PGs drive them!). So I am breaking the PG-related
papers into two columns, and ranked by time, plus a brief
description for each:
On the PGs approach/systems:
An
Information Theoretic Score for Learning
Hierarchical Concepts, 2023, (focuses on and
further develops CORE, but describes the system as
well) In Frontiers in Computational Neuroscience,
special topic on Advances in Shannon-based
Communications and Computations Approaches to
Understanding Information Processing in the
Brain.
Expedition: A System for the Unsupervised Learning
of a Hierarchy of Concepts , arXiv 2021
(revival of PGs after a ~10 year hiatus! Introduces a version of
CORE=COherehce+REality, a measure of gain in
information, useful for concept use, ie
interpretation/inference). We have a good
candidate objective now! And PGs become much more
probabilistic and information theoretic.
Systems
Learning for Complex Pattern Problems,BICA
2008: We may need systems, composed of
multiple parts (ml algorithms) interacting over
long periods, for improving at perception
(and, specially once we add control, because of
feedback, understanding the development of such
systems could be interesting and
challenging).
Prediction Games in Infinitely Rich Worlds, 2007
(AAAI
position paper ,
Longer
technical report on the basic idea/philosophy and
various motivations/considerations/challenges. )
Selected papers on the prediction sub-problem (online and open-ended, non-stationary, ..):
- Tracking
Changing Probabilities via Dynamic Learners, arXiv
2024 (formalizes open-ended probability prediction and advances sparse EMA and counting techniques
for the task; please
see page
on Sparse Moving Averages)
-
Efficient Online Learning and Prediction of Users' Desktop
Actions., IJAI 2009 (on non-stationarity, continual
learning, or pure online learning, and personalization; uses Sparse EMA).
- On
Updates that Constrain the Features' Connections
During Learning, ACM KDD, 2008 (further focus on
types of weight updates that keep the number of
connections small; introduces sparse EMA).
-
Learning When concepts Abound, JMLR 2009 (online and open-ended, and just a
pure weighted index, no prototypes any more! (for the prediction part)).
- Index Learning: Recall Systems:
Efficient Learning and Use of Category Indices, AISTATS, 2007 (index into concept prototypes, many-class learning).