Tuesday, May 14, 2013

Neocorticalism and Its Weaknesses

[Blogger's note:  I went dark halfway through writing this.  Started it back in November, then got diverted into a self-imposed software project and lost interest for a while.  Not sure I'm back to blogging for real--we'll see.  No doubt you, my three readers, are incredibly excited!]

Just finished reading [well, last November] Ray Kurzweil's How to Create a Mind.  This has the usual Kurzweillian arguments about why all information technologies--including those associated with biology and neuroscience--grow at an exponential rate, irrespective of the actual technology base.  It extends his singularity books by going a bit deeper into how to build artificial cortical pattern recognizers, and makes the argument that a hierarchy of these modules pretty much constitutes the human mind and its consciousness.

Kurzweil references Jeff Hawkins's On Intelligence, one of my favorite books of all time.  Kurzweil takes issue with some of the details of Hawkins's model, but the two authors both agree on several central points:
  • Both note the existence of a "cortical algorithm", where all areas of the neocortex work pretty much the same and project axons in pretty much the same way.  This means that patterns get recognized the same way everywhere, and project up to higher level pattern recognizers and down to excite, reinforce, or inhibit lower level patterns.
  • Kurzweil thinks that consciousness is an emergent property of a big neocortex, while Hawkins thinks that consciousness is "what it feels like to have a cortex".
  • Both of them spend a lot of time on input pattern recognition, but very little time on motor outputs.
  • Both treat the thalamus and brain stem as I/O devices, with little to do with consciousness (although Kurzweil does note that the thalamus is essential to be conscious).
  • Hawkins thinks that the secret sauce for complex cognition lies in the time-dependent behavior of cortical activation, where pattern #1 primes the downward links for pattern #2 to be activated if things go as the patterns expect them to in the immediate future.  Kurzweil is strangely silent on time-dependent behavior, which is weird, since speech recognition is so heavily time-biased.  I think he's glossing over some behavior of hidden Markov models in which I'm not expert.  He may also be slightly cagey about trade secrets.
I think that they're barking up the right tree with respect to the cortical algorithm, but they're both minimizing the importance of the older brain structures.  I have some trouble with this "neocorticalist" approach.  Here are my big objections:
  • Neocorticalism can't really explain attention.  As I've said before, I think that attention is likely an incredibly ancient property and is intimately involved with consciousness.  I'm prepared to believe that human-style self-reflection and theory of mind might be new, emergent properties, but they can only emerge because of some repurposing of older mechanisms.
  • Simple pattern matching doesn't get you very good motor performance.  I'm prepared to believe that the motor cortex is coordinating intentions and hifgh-level actions, but the whole business seems way too asynchronous to allow us to be catching balls or playing the piano.  The cerebellum is clearly involved, but I suspect that you need a way to make the neocortex semi-synchronous.  There's some evidence that the basal ganglia are involved at least in time perception; I'll bet we're going to discover that the same structures are "clocking" groups of neocortical patterns to coordinate activities in the motor cortex.
  • One of the things that was intriguing about Hawkins's architecture is that it hinted that the learning process self-organized the cortical hierarchy so that novel or unlearned activities were first handled, somewhat hesitantly, at high levels, but the learning process pushed down the salient features into lower level learning, but no mechanism was described for how this happens.  Kurzweil pretty much ignored this, going so far as to posit some kind of central allocator of pattern recognizers.  There's something subtle going on here that pure neocorticalism can't capture.
Since this book was published, Kurweil has taken a high-level technologist position at Google, which has also bought, DNNresearch, Geoff Hinton's startup.  Deep learning, a term coined by Hinton, has been getting a lot of press recently.  It's pretty clear that Google has decided to pour money into this.  It's a problem that's amenable to implementation in cloud computing, so the fit is pretty good.

People implementing deep learning train each layer in a multi-layer network separately, using unsupervised feature detection.  ("Unsupervised" means that you don't tell the layer when it's done something right or wrong--you merely let it classify inputs as it sees fit.)  This still can't be quite what biological systems do, because they can figure out their own layering, which has to be imposed for deep learning.  My guess is that layering in humans is governed by chunks of the cortex that are genetically predisposed to accept axons from I/O-like areas of the thalamus and other parts of the brainstem.  They therefore learn to detect features associated with that kind of input and project out to a more amorphous set of regions, which can then combine multiple projections into novel features.

This still doesn't define a mechanism where layers compete with each other to identify features at the appropriate level of detail, but you can see how that might emerge with enough feedback between layers.

My guess is that we're going to discover that a huge amount of our cognition is dependent on a fairly arbitrary set of input regions mapping to another fairly arbitrary set of cross-regions, and so on.  But note that "arbitrary" doesn't mean that they're not genetically predetermined.  The stability of those mappings across most humans is what allows us to communicate with each other and what allows us to perceive most individuals as "sane".  One of the most interesting things about figuring out how all this hangs together is the possibility of producing entities that think radically differently from us but which still can extract insights about the universe that we're not optimized to perceive.

Things are going to move pretty quickly from here.  We aren't close to a strong AI using this kind of technology, but the Watson/Jeopardy thing shows that you can get an awful lot of interesting work out of something that you wouldn't necessarily want at your dinner party.

No comments: