Theme 1 “Language and Cognition: Corpus, Behaviour and Education”

Coordinators: Sylvie HANOTE (PR, FORELL) and Manual GIMENES (MCF, CERCA)

This research area is currently centred on one common theme: the syllable and syllabification.

  • In the “Writing” programme (Coordinator: Eric LAMBERT, MCF, CERCA), a reflection is conducted on the relevance of the syllabic unit as a unit structuring the writing time pattern and as a unit of acquisition of spelling during the written language acquisition. A study is also conducted within the context of the relationship between morphology and syllabic cut.
  • In the “Speech” programme (Coordinator: Sylvie HANOTE, PR, FORELL), researchers deal with similar questions in the fields of diachrony and synchrony in French and English; for example, about the evolution of the French vowel duration related to the structure of the syllable, or on the links between morphology, syllabic cut and lexical transparency in the pronunciation of prefixed, suffixed and compound words. In partnership with the XLIM laboratory (Universities of Poitiers and Limoges), studies are under process to define the acoustic correlates of syllabic cut.
  • In the programme “Educational Tools and new digital technologies” (Coordinators: Freiderikos VALETOPOULOS, PR, FORELL, and Jacques BOUCHAND, MCF, TECHNE), we focus on learners’ perception of the syllable and syllable cut. In connection with Poitiers University IDEFI-PARE project, a study is planned on note-taking at university and the status of the syllable on note-taking among Francophone advanced learners (in connection with the School of Teaching and Education) and allophone advanced learners (linked to the French Foreign Language Centre).


Achievements of the theme of research (corpus and databases):

  • Corpus “Word”:

Description: Corpus composed of authentic English radio documents from British radio BBC Radio 4 and US radio National Public Radio; the French documents come from France Inter and France 3 Television. The corpus gathers 80,000 words (around 8h of audio documents), date: 30 April 2015. This corpus is made for contextual data exploitation. It is currently focused on English (UK and US), but it contains some documents aligned in French and Spanish, and is meant to be expanded to other varieties of English and to other languages such as German, Greek or Romanian, as well as to multilingual comparable corpus.

Document Format: .wav

Alignment: The documents are graphically transcribed and aligned (text-sound), and alignment was performed with Praat software.

Operating (software): tool to search the corpus: Dolmen software, developed by Julien Eychenne (

Contacts: Sylvie Hanote ( and Nicolas Videau (

Using the corpus: This corpus is intended for researchers interested in various fields of oral linguistics: phonetics/phonology, morphology, lexicology, syntax, semantics and discourse analysis. To have this corpus used by the greatest number, it was decided to stick to a graphic transcription under specific rules: it allows each researcher to use them as needed.



  • “Worldlex” Database

Description: Worldlex is a database containing lexical frequency and contextual diversity indications for several thousand words in 66 different languages. These indications (calculated from tweets corpus of blog posts and online newspapers articles) reliably predict the reaction times in lexical decision tasks (Gimenes and New, under review).

Document Format: .txt

Using the database: these frequencies are mainly intended for psycholinguistics researchers who want to study or control lexical frequency, and to make cross-language comparisons.

Contacts: Manuel Gimenes ( and Boris New (



