How human brain learns to read? Reading acquisition relies on creating and developing an interface between vision and spoken language, in charge of orthographic analysis. Over the past 20 years, some basic features of this interface have been put to light. Especially, data are supporting the emergence during reading acquisition of a specific region underlying orthographic coding. This region, the visual word form area (VWFA), has a fixed location in the left ventral visual cortex. It responds selectively to written words more than to other visual stimuli. However, how neural circuits at this site implement an invariant recognition of written words remains unknown.
On one hand, this region could arise through the repurposing, for letter recognition, of a subpart of the ventral visual pathway initially involved in face and object recognition (according to the neuronal recycling hypothesis). On the other hand, its reproducible localization could be due to pre-existing connections from this subregion to areas involved in spoken language processing (according to the biased connectivity hypothesis).
In a recent study conducted by Stanislas Dehaene, researchers from UNICOG/NeuroSpin in collaboration with the Paris Brain Institute (ICM) assessed to what extent a minimal computational model of those two hypotheses may suffice to account for emergence of the VWFA during reading acquisition. Researchers focused on the learning of words and how their combinations of letters are represented. They have designed biologically plausible artificial deep neural networks inspired from that of the ventral visual cortex (convolutional neural networks – CNN) and whose architecture was not designed for reading. As occurs in children, a standard CNN was first trained to identify pictures of various objects and scenes, and then a set of 1000 written words of different lengths across variations in location, size, font and case. They tested the biased-connectivity hypothesis by comparing networks whose dense layer was either fully connected to all output units, or in which only a subset of dense units were connected to the output layer, simulating a putative VWFA.
They show that their models can account for many properties of the VWFA, particularly when a subset of units possesses a biased connectivity to word outputs units. The network develops a sparse, invariant representation of written words, based on a restricted set of reading-selective units. Their activation mimics several properties of the VWFA and their lesioning causes a reading-specific deficit. The model predicts that, in literate brains, written words are encoded by a compositional neural code with neurons tuned either to individual letters and their ordinal position relative to word start or word ending, or to pairs of letters (bigrams).
These predictions may soon become testable either with very high resolution high-field fMRI or with high-density intracranial recordings. Moreover, there constitute a first modelling step. In fact, it is known that other representations (especially, phonological representations) do impact on ventral visual representations in humans. A more complex, recurrent architecture, combining visual and phonological inputs, would be needed to accurately capture those observations.