Login | Register

Synthetic voices in the foreign language context


Synthetic voices in the foreign language context

Bione Alves, Tiago (2017) Synthetic voices in the foreign language context. Masters thesis, Concordia University.

Text (application/pdf)
Bione-Alves_MA_F2017.pdf - Accepted Version
Available under License Spectrum Terms of Access.


Second language (L2) researchers and practitioners have explored the pedagogical capabilities of text-to-speech synthesizers (TTS) for their potential to enhance the acquisition of writing (Kirstein, 2006), vocabulary and reading (Proctor, Dalton, & Grisham, 2007), and pronunciation (Cardoso, Collins, & White, 2012; Liakin, Cardoso, & Liakina, 2017; Soler-Urzua, 2011). Despite the positive evidence to support the use of TTS as a learning tool, the applications need to be formally evaluated for their potential to promote the conditions under which languages are acquired, particularly in an English as a foreign language (EFL) environment, as suggested by Cardoso, Smith, and Garcia Fuentes (2015).
The current study evaluated the voice of a modern English TTS system—used in an EFL context in Brazil—in terms of its speech quality, ability to be understood by L2 users, and potential for focus on specific language forms, and was operationalized based on the following criteria: (1) users’ ratings of holistic features (comprehensibility, naturalness, and accuracy, as defined by Derwing & Munro, 2005); (2) intelligibility (the extent to which a message is actually understood), measured with a dictation task; (3) text comprehension (i.e., users’ ability to understand a text and answer comprehension questions); and (4) users’ ability to hear a specific morpho-phonological feature (i.e., the aural identification of English past tense -ed.)
Twenty-nine Brazilian EFL learners listened to stories and sentences, produced alternately by a TTS voice and a human, and rated them on a 6-point Likert scale according to the abovementioned holistic criteria (comprehensibility, naturalness, and accuracy). In addition, they were asked to answer a set of comprehension questions to assess their ability to understand what they had heard. To measure intelligibility, participants completed a dictation task in which they were asked to transcribe utterances, as recommended by Derwing and Munro (2005). Finally, participants performed an aural identification of 16 sentences to judge whether the target feature (past mark -ed) was present or not. After these tasks were completed, semi-structured interviews were conducted to collect data regarding participants’ perceptions of the technology.
Results indicate that the performance of both the TTS and human voices were perceived similarly in terms of comprehensibility, while ratings for naturalness were unfavorable for the TTS voice. In addition, participants performed relatively similarly in response to both voices with respect to the tasks involving text comprehension, dictation, and identifying a target linguistic form (past -ed) in aural input. These findings suggest that TTS systems have the potential to be used as pedagogical tools for L2 learning, particularly in an EFL setting where natural occurrence of the target language is limited or non-existent.

Divisions:Concordia University > Faculty of Arts and Science > Education
Item Type:Thesis (Masters)
Authors:Bione Alves, Tiago
Institution:Concordia University
Degree Name:M.A.
Program:Applied Linguistics
Date:12 September 2017
Thesis Supervisor(s):Cardoso, Walcir
Keywords:text-to-speech synthesis, TTS, pronunciation, English as a Foreign Language
ID Code:983062
Deposited By: TIAGO ALVES
Deposited On:17 Nov 2017 18:49
Last Modified:18 Jan 2018 17:56


Bailly, G. (2003). Close shadowing natural versus synthetic speech. International Journal of Speech Technology, 6(1), 11–19.
Barcroft, J., & Sommers, M. S. (2005). Effects of acoustic variability on second language vocabulary learning. Studies in Second Language Acquisition, 27(3), 387–414.
Benjamini, Y., & Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society, 57(1), 289–300.
Bione, T., Grimshaw, J., & Cardoso, W. (2016). An evaluation of text-to-speech synthesizers in the foreign language classroom: learners’ perceptions. In S. Papadima-Sophocleous, L. Bradley & S. Thouësny (Eds), CALL communities and culture – short papers from EUROCALL 2016, Limassol, Cyprus (pp. 50-54). Dublin, IE: Research-publishing.net.
British Council Brasil (2015). O Ensino de Inglês na Educação Pública Brasileira. São Paulo, BR: British Council.
Cardoso, W., Smith, G., & Garcia Fuentes, C. (2015). Evaluating text-to-speech synthesizers. In F. Helm, L. Bradley, M. Guarda, & S. Thouësny (Eds.), Critical CALL – Proceedings of the 2015 EUROCALL Conference, Padova, Italy (pp. 108–113). Dublin, IE: Research-publishing.net.
Celce-Murcia, M., Brinton, D. M., & Goodwin, J. M. (2010). Teaching pronunciation: A course book and reference guide. New York: Cambridge University Press.
Chapelle, C. A. (2001a). Computer applications in second language acquisition: Foundations for teaching testing and research. Cambridge, UK: Cambridge University Press.
Chapelle, C. A. (2001b). Innovative language learning: Achieving the vision. ReCALL, 23(10), 3–14
Chapelle, C. A. (2003). English language learning and technology: Lectures on Applied Linguistics in the age of information and communication. Amsterdam, NL: John Benjamins.
Chiu, T. L., Liou, H. C., & Yeh, Y. (2007). A study of web-based oral activities enhanced by automatic speech recognition for EFL college learning. Computer Assisted Language Learning, 20(3), 209–233.
Collins, L., Trofimovich, P., White, J., Cardoso, W., & Horst, M. (2009). Some input on the easy/difficult grammar question: An empirical study. The Modern Language Journal, 93(3), 336–353.
Collins, L., & Muñoz, C. (2016). The foreign language classroom: Current perspectives and future considerations. The Modern Language Journal, 100(S1), 133–147.
Cope, C., & Ward, P. (2002). Integrating learning technology into classrooms: The importance of teachers’ perceptions. Educational Technology & Society, 5(1), 67–74.
Council of Europe (2001). Common European Framework of Reference for Languages: Learning, teaching, assessment. Cambridge, UK: Cambridge University Press.
Delogu, C., Conte, S., & Sementina, C. (1998). Cognitive factors in the evaluation of synthetic speech. Speech Communication, 24(2), 153–168.
Derakhshan, A. & Khodabakhshzadeh, H. (2011). Why CALL why not MALL: An in-depth review of text-message vocabulary learning. Theory and Practice in Language Studies, 1(9), 1150–1159.
Derwing, T. M. & Munro, M. J. (2005). Second language accent and pronunciation teaching: A research-based approach. TESOL Quarterly, 39(3), 379–397
Díez-Bedmar, M. B. & Pérez-Paredes, P. (2012). The types and effects of peer native speakers’ feedback on CMC. Language Learning & Technology, 16(1), 62–90.
Ellis, N. C. (2002). Frequency effects in language processing: A review with implications for theories of implicit and explicit language acquisition. Studies in Second Language Acquisition, 24(2), 143–188.
Ellis, N. C. (2006). Selective attention and transfer phenomena in L2 acquisition: Contingency, cue competition, salience, interference, overshadowing, blocking, and perceptual learning. Applied Linguistics, 27(2), 164–194.
Ellis, N. & Collins, L. (2009). Input and Second Language Acquisition: The Roles of Frequency, Form, and Function Introduction to the Special Issue. The Modern Language Journal, 93(3), 329–336.
Ericsson, K. A., Krampe, R. T., & Tesch–Römer, C. (1993). The role of deliberate practice in the acquisition of expert performance. Psychological Review, 100(3), 363–406.
Fiori, M. L. (2005). The development of grammatical competence through synchronous computer-mediated communication. CALICO Journal, 22(3), 567–602.
Gass, S. M. & Mackey, A. (2007). Input, interaction, and output in second language acquisition. In B. VanPatten & J. Williams (Eds.), Theories in second language acquisition: An introduction (pp. 175–199). New Jersey: Lawrence Erlbaum
Goldstein, M. (1995). Classification of methods used for assessment of text-to-speech systems according to the demands placed on the listener. Speech Communication, 16(3), 225–244.
Grimshaw, J., Cardoso, W., & Waddington, D. (2016). Can a ‘shouting’ digital game help learners develop oral fluency in a second language? In S. Papadima-Sophocleous, L. Bradley & S. Thouësny (Eds.), CALL communities and culture – short papers from EUROCALL 2016, Limassol, Cyprus (pp. 172–177). Dublin, IE: Research-publishing.net.
Handley, Z. & Hamel, M-J. (2005). Establishing a methodology for benchmarking speech synthesis for computer-assisted language learning (CALL). Language Learning & Technology, 9(3), 99–120.
Handley, Z. (2009). Is text-to-speech synthesis ready for use in computer-assisted language learning? Speech Communication, 51(10), 906–919.
Herrington, R. (2002). Controlling the false discovery rate in multiple hypothesis testing. Research and Statistical Support [Website]. University of North Texas, Denton, USA. Retrieved from https://it.unt.edu/sites/default/files/rss-id-false-discovery-rate-hypothesis-testing.pdf
Honey, M., Culp, K. M., & Carrigg, F. (2000). Perspectives on technology and education research: Lessons from the past and present. Journal of Educational Computing Research, 23(1), 5–14.
Horst, M., Cobb, T., & Meara, P. (1998). Beyond a clockwork orange: Acquiring second language vocabulary through reading. Reading in a Foreign Language, 11(2), 207–223.
Idiomas Sem Fronteiras (2017, January 28). Inglês Sem Fronteiras - Curso presencial [Website] Retrieved from http://isf.mec.gov.br/ingles/pt-br/curso-presencial
Jamieson, J. & Chapelle, C. A. (2010). Evaluating CALL use across multiple contexts. System, 38(3), 357–369.
John, P. & Cardoso, W. (2016). A comparative study of text-to-speech and native speaker output. In J. Demperio, E. Rosales & S. Springer (Eds.), Proceedings of the Meeting on English Language Teaching (pp. 78–96). Québec, CA: Université du Québec à Montréal Press.
Kang, M., Kashiwagi, H., Treviranus, J., & Kaburagi, M. (2008). Synthetic speech in foreign language learning: An evaluation by learners. International Journal of Speech Technology, 11(2), 97–106.
Kirstein, M. (2006). Universalizing universal design: Applying text-to-speech technology to English language learners’ process writing (Doctoral dissertation). University of Massachusetts, Boston, MA.
Krashen, S. (1985). The input hypothesis: issues and implications. New York: Longman.
Larson-Hall, J. (2010). A guide to doing statistics in second language research using SPSS. New York: Routledge.
Leow, R. P. (2015). Conclusion: The changing L2 classroom, and where do we go from here? In Leow, R. P. (Ed.), Explicit learning in the L2 classroom: A student-centered approach (pp. 270–278). New York: Routledge.
Levy, M. & Hubbard, P. (2005). Why call CALL “CALL”? Computer Assisted Language Learning, 18(3), 143–149.
Liakin, D., Cardoso, W., & Liakina, N. (2015). Learning L2 pronunciation with a mobile speech recognizer: French /y/. CALICO Journal, 32(1), 1–25.
Liakin, D., Cardoso, W., & Liakina, N. (2017). The pedagogical use of mobile speech synthesis (TTS): Focus on French liaison. Computer Assisted Language Learning, 30(3–4), 348–365.
Lightbown, P. M. (2003). SLA research in the classroom/SLA research for the classroom. The Language Learning Journal, 28(1), 4–13.
Lin, C. H., Fisher, B. E., Winstein, C. J., Wu, A. D., & Gordon, J. (2008). Contextual interference effect: Elaborative processing or forgetting-reconstruction? A post hoc analysis of transcranial magnetic stimulation-induced effects on motor learning. Journal of Motor Behavior, 40(6), 578–586.
Lu, M. (2008). Effectiveness of vocabulary learning via mobile phone. Journal of Computer Assisted Learning, 24(6), 515–525.
Munro, M. J. & Derwing, T. M. (1995). Foreign accent, comprehensibility, and intelligibility in the speech of second language learners. The Language Learning Journal, 45(1), 73–97.
Nation, P. & Wang Ming-Tzu, K. (1999). Graded readers and vocabulary. Reading in a Foreign Language, 12(2), 355–380.
Neri, A, Cucchiarini, C. & Strik, W. (2003). Automatic speech recognition for second language learning: How and why is actually works. In M. J. Solé, D. Recasens & J. Romero (Eds.), Proceedings of the 15th international Conference on Phonetic Sciences, Barcelona, Spain (pp. 1157–1160). Adelaide, AU: Causal Productions Pty Ltd.
Neri, A., Cucchiarini, C., Strik, H., & Boves, L. (2002). The pedagogy-technology interface in computer assisted pronunciation training. Computer Assisted Language Learning, 15(5), 441–467
Nusbaum, H. C., Francis, A. L., & Henly, A. S. (1995). Measuring the naturalness of synthetic speech. International Journal of Speech Technology, 2(1), 7–19.
Nye, P. W., Ingemann, F., & Donald, L. (1975). Synthetic speech comprehension: A comparison of listener performances with and preferences among different speech forms. Haskins Laboratories: Status report on speech perception SR-41, 117–126.
Ortega, L. (2013). Understanding second language acquisition. Abingdon, UK: Routledge.
Proctor, C. P., Dalton, B., & Grisham, D. (2007). Scaffolding English language learners and struggling readers in a universal literacy environment with embedded strategy instruction and vocabulary support. Journal of Literacy Research, 39(1), 71–9.
Rose, D. & Dalton, B. (2002). Using technology to individualize reading instruction. In C. C. Block, L. B. Gambrell, & M. Pressley (Eds.), Improving comprehension instruction: Rethinking research, theory, and classroom practice (pp. 257–274). San Francisco: Jossey-Bass.
Schmidt, R. (1990). The role of consciousness in second language learning. Applied Linguistics, 11, 129–158.
Smith, B. (2004). Computer-mediated negotiated interaction and lexical acquisition. Studies in Second Language Acquisition, 26, 365–398.
Soler-Urzua, F. (2011). The acquisition of English /ɪ/ by Spanish speakers via text-to-speech synthesizers: A quasi-experimental study (Master's Thesis). Concordia University, Montreal, CA.
Stern, S. E., Mullennix, J. W., & Yaroslavsky, I. (2006). Persuasion and social perception of human vs. synthetic voice across person as source and computer as source conditions. International Journal of Human-Computer Studies, 64(1), 43–52.
Stevens, C., Lees, N., Vonwiller, J., & Burnham, D. (2005). On-line experimental methods to evaluate text-to-speech (TTS) synthesis: Effects of voice gender and signal quality on intelligibility, naturalness and preference. Computer Speech & Language, 19(2), 129–146.
Sundberg, R. & Cardoso, W. (2016). Aligning out-of-class material with curriculum: Tagging grammar in a mobile music application. In S. Papadima-Sophocleous, L. Bradley & S. Thouësny (Eds), CALL communities and culture – short papers from EUROCALL 2016, Limassol, Cyprus (pp. 440-444). Dublin, IE: Research-publishing.net.
Tanaka, T. (2009). Communicative language teaching and its cultural appropriateness in Japan. Doshisha Studies in English, 84, 107–123
Thomson, R. I. (2011). Computer assisted pronunciation Training: Targeting second language vowels perception improves pronunciation, CALICO Journal, 28(3), 744–65.
Thomson, R. I. (2012). Improving L2 listeners’ perception of English vowels: A computer–mediated approach. Language Learning, 62(4), 1231–1258.
Trofimovich, P., Collins, L., Cardoso, W., White, J., & Horst, M. (2012). A frequency‐based approach to L2 phonological learning: Teacher input and student output in an intensive ESL context. TESOL Quarterly, 46(1), 176–186.
VanPatten, B. (2007). Input processing in adult second language acquisition. In B. VanPatten & J. Williams (Eds.), Theories in second language acquisition (pp. 115–135). New Jersey: Lawrence Erlbaum
Webb, S. (2007). The effects of repetition on vocabulary knowledge. Applied Linguistics, 28(1), 46–65.
All items in Spectrum are protected by copyright, with all rights reserved. The use of items is governed by Spectrum's terms of access.

Repository Staff Only: item control page

Downloads per month over past year

Back to top Back to top