Synthetic voices in the foreign language context


Bione Alves, Tiago (2017) Synthetic voices in the foreign language context. Masters thesis, Concordia University.

Second language (L2) researchers and practitioners have explored the pedagogical capabilities of text-to-speech synthesizers (TTS) for their potential to enhance the acquisition of writing (Kirstein, 2006), vocabulary and reading (Proctor, Dalton, & Grisham, 2007), and pronunciation (Cardoso, Collins, & White, 2012; Liakin, Cardoso, & Liakina, 2017; Soler-Urzua, 2011). Despite the positive evidence to support the use of TTS as a learning tool, the applications need to be formally evaluated for their potential to promote the conditions under which languages are acquired, particularly in an English as a foreign language (EFL) environment, as suggested by Cardoso, Smith, and Garcia Fuentes (2015).
The current study evaluated the voice of a modern English TTS system—used in an EFL context in Brazil—in terms of its speech quality, ability to be understood by L2 users, and potential for focus on specific language forms, and was operationalized based on the following criteria: (1) users’ ratings of holistic features (comprehensibility, naturalness, and accuracy, as defined by Derwing & Munro, 2005); (2) intelligibility (the extent to which a message is actually understood), measured with a dictation task; (3) text comprehension (i.e., users’ ability to understand a text and answer comprehension questions); and (4) users’ ability to hear a specific morpho-phonological feature (i.e., the aural identification of English past tense -ed.)
Twenty-nine Brazilian EFL learners listened to stories and sentences, produced alternately by a TTS voice and a human, and rated them on a 6-point Likert scale according to the abovementioned holistic criteria (comprehensibility, naturalness, and accuracy). In addition, they were asked to answer a set of comprehension questions to assess their ability to understand what they had heard. To measure intelligibility, participants completed a dictation task in which they were asked to transcribe utterances, as recommended by Derwing and Munro (2005). Finally, participants performed an aural identification of 16 sentences to judge whether the target feature (past mark -ed) was present or not. After these tasks were completed, semi-structured interviews were conducted to collect data regarding participants’ perceptions of the technology.
Results indicate that the performance of both the TTS and human voices were perceived similarly in terms of comprehensibility, while ratings for naturalness were unfavorable for the TTS voice. In addition, participants performed relatively similarly in response to both voices with respect to the tasks involving text comprehension, dictation, and identifying a target linguistic form (past -ed) in aural input. These findings suggest that TTS systems have the potential to be used as pedagogical tools for L2 learning, particularly in an EFL setting where natural occurrence of the target language is limited or non-existent.

Divisions:Concordia University > Faculty of Arts and Science > Education
Item Type:Thesis (Masters)
Authors:Bione Alves, Tiago
Institution:Concordia University
Degree Name:M.A.
Program:Applied Linguistics
Date:12 September 2017
Thesis Supervisor(s):Cardoso, Walcir
Keywords:text-to-speech synthesis, TTS, pronunciation, English as a Foreign Language
ID Code:983062
Deposited By: TIAGO ALVES
Deposited On:17 Nov 2017 18:49
Last Modified:18 Jan 2018 17:56


