Due to the temporal characteristics of the hemodynamic response, it is particularly difficult to allocate a sequence of brief cognitive processes to specific brain regions. However, by drawing on convergent information from a variety of methods, including fMRI, EEG and animal neurophysiology, it is possible to make educated guesses about the neural substrates underlying these cognitive processes.
Identification
Early visual areas and portions of the ventral stream, particularly the anterior portion of the temporal lobe, are thought to rapidly analyze the shape information present in a visual stimulus. In particular, neurons of the inferotemporal cortex respond with great specificity to particular objects (Tanaka 1996), and do so within as little as 120 ms after stimulus onset (Hung, Kreiman, Poggio, & Dicarlo 2005). Such representations are also activated by passive viewing of briefly presented stimuli in an RSVP stream (Keysers, Xiao, Foldiak & Perrett 2005) and IT cells are also spatially selective (DiCarlo &Maunsell 2003).
The eSTST model assumes that this stimulus identification occurs in a rapid, feedforward manner as simulated by other modelling efforts (Thom Serre’s model of the ventral stream, see also VanRullen 2007 and VanRulle & Thorpe 2002).
Attention
In response to a visual stimulus that matches the target set, the eSTST model simulates a rapid deployment of transient attention (Nakayama & Mackeben 1989, Muller & Rabbitt 1989, Yeshurun & Carrasco 1998). Recent neuroimaging work has suggested that this stimulus driven attention in RSVP experiments may originate from a region known as the temporo parietal junction or TPJ (Serences, Shomstein, et al 2005, Corbetta & Shulman 2002).
Encoding
The eSTST proposes that a visual target in an RSVP stream is rendered into a tokenized memory representation. For such a target, this encoding persists for hundreds of milliseconds, persisting long after the stimulus has been masked by following items in the RSVP stream. This encoding process is assumed to rely on the coordinated activation of a broad network of brain regions in temporal, parietal and frontal areas that tie the visual features of an object to its phenomenal duration, and its spatiotopic coordinates. This perspective is informed by monkey neurophysiology studies, which consistently find working memory correlates for single neurons in prefrontal and parietal areas (cf Constantinidis & Procyk 2004) as well as MEG studies which find coordinated activity between these regions during encoding (cf Hommel et al 2006).
The form of encoding simulated by eSTST fits well with the notion of access consciousness, and the widespread recurrent processing that is thought to link stimulus information in visual brain areas with the executive level representations in frontal areas (Lamme 2003). The eSTST model also assumes that the widespread activation of this encoding process produces the enormous P3 potential in EEG studies (Wikipedia entry for P300).
Tokenized Representations
The ultimate goal of the process of identifying, selecting and encoding a stimulus is to produce a compact, tokenized representation of that stimulus which can be flexibly used by working memory operations. This representation can be part of a temporally ordered sequence of other tokenized stimuli, and item order information seems to be fundamental to the way we store sequential items in working memory. For instance, even when stimuli are presented too closely in time to reliably encode their order, experimental participants are quite confident that they have accurately encoded the sequence (Caldwell-Harris & Morris 2008).
Monkey neurophysiology has long suggested that it is in areas of the dorsolateral prefrontal cortex where durable working memory representations reside (cf Constantinidis & Procyk 2004), and experiments that examine the neural substrates of sequence memory have found neurons in these prefrontal regions that are highly selective for stimulus sequence (Ninokura Y, Mushiake H, & Tanji J. 2004; Warden, M. R., & Miller, E. K 2007)
