Recently, a cortical model for visual grouping and segmentation (the Laminart model) has been integrated to the NRP. From there, the goal was to build a whole visual system on the NRP, connecting many models for different functions of vision (retina processing, saliency computation, saccades generation, predictive coding, …) in a single virtual experiment, including the Laminart as a model for early visual processing. While this process is on-the-go (see here), some scientifically relevant progress already arose from the premises of this implementation. This is what is going to be explained right now.
The Laminart is one of the only models being able to satisfactorily explain how crowding occurs in the visual system. Crowding is a visual phenomenon that happens when perception of a target deteriorates in the presence of nearby elements. Crowding occurs in real life (for example when driving in the street, see fig. 1a) and is widely studied in many psychophysical experiments (see fig. 1b). Crowding happens ubiquitously in the visual system and must thus be accounted for by any complete model of the visual system.
While crowding was for a long time believed to be driven by local interactions in the brain (e.g. decremental feed-forward pooling of different receptive fields along the hierarchy of the visual system, jumbling the target’s visual features with the one from nearby elements), it recently appeared that adding remote contextual elements can still modulate crowding (see fig. 1c). The entire visual configuration is eligible to determine what happens at the very tiny scale of the target!
To account for this exciting phenomenon (named uncrowding), Francis et al. (2017) proposed a model that parses the visual stimulus in several groups, using low-level, cortical dynamics (arising from a biologically plausible and laminarly structured network of spiking neurons, with fixed connectivity). Crucially, the Laminart is a 2-stage model in which the input image is segmented in different groups before any decremental interaction can happen between the target and nearby elements. In other words: how elements are grouped in the visual field determines how crowding occurs, making the latter a simple and behaviourally measurable phenomenon that unambiguously describes a central feature of human vision (grouping). In fig. 1c (right), the 7 squares form a group that frames the target, instead of interfering with it, hence enhancing performance. In the Laminart model, the 7 squares are grouped together by illusory contours and are segmented out, leaving a privileged access to the target left alone. However, in order to work, the Laminart model needs to start the segmentation spreading process somewhere (see fig. 2).
Up to now, the model was sending ad-hoc top-down signals, lacking an explicit process to generate them. Here, using the NRP, we could easily connect it to a model for saliency computation that was just integrated to the platform. Feeding the Laminart, the saliency computation model delivers its output as a bottom-up influence to where segmentation signals should arise. On the NRP, we created the stimuli appearing in the experimental results shown in fig. 1c, and presented them to the iCub robot. In this experiment, each time a segmentation signal is sent, its location is sampled from the saliency computation map, linking both models in an elegant manner. Notably, when only 1 squares flanks the target, the saliency map is merely a big blob around the target, whereas when 7 squares flank the target, the saliency map is more peaky around the outer squares (see fig. 3). Consequently, the more squares there are, the more probable it is that the segmentation signals succeed in creating 2 groups from the flankers and the target, releasing the target from crowding. This fits very well with the results of figure 1c. The next step for this project is to reproduce the results quantitatively on the NRP.
To sum up, building a visual system on the NRP, we could easily make the connection between a saliency computation model and our Laminart model. This connection greatly enhanced the latter model and gives it the opportunity to explain very well how uncrowding occurs in human vision and the low-level mechanisms of visual grouping. In the future, we will run psychophysical experiments in our lab, where it is possible to disentangle top-down from bottom-up influence on uncrowding, seeing whether a strong influence of saliency computation on visual grouping makes any sense.