Login | Register

Automatically augmenting academic text for language learning: PhD abstract corpora with the British Library


Automatically augmenting academic text for language learning: PhD abstract corpora with the British Library

Wu, Shaoqun, Fitzgerald, Alannah ORCID: https://orcid.org/0000-0003-0392-2740, Witten, Ian and Yu, Alex (2018) Automatically augmenting academic text for language learning: PhD abstract corpora with the British Library. In: Zou, Bin and Thomas, Michael, (eds.) Handbook of Research on Integrating Technology Into Contemporary Language Learning and Teaching. IGI Global. ISBN 9781522551409

[thumbnail of fitzgerald-chap25_zou-2018-book.pdf]
Text (application/pdf)
fitzgerald-chap25_zou-2018-book.pdf - Published Version
Available under License Spectrum Terms of Access.

Official URL: https://www.igi-global.com/book/handbook-research-...


This chapter describes the automated FLAX language system (flax.nzdl.org) that extracts salient linguistic features from academic text and presents them in an interface designed for L2 students who are learning academic writing. Typical lexico-grammatical features of any word or phrase, collocations and lexical bundles are automatically identified and extracted in a corpus; learners can explore them by searching and browsing, and inspect them along with contextual information. This chapter uses a single running example, the PhD abstracts corpus of 9.8 million words, derived from the open access Electronic Theses Online Service (EThOS) at the British Library, but the approach is fully automated and can be applied to any collection of English writing. Implications for reusing open access publications for non-commercial educational and research purposes are presented for discussion. Design considerations for developing teaching and learning applications that focus on the rhetorical and lexico-grammatical patterns found in the abstract genre are also discussed.

Divisions:Concordia University > Faculty of Arts and Science > Education
Item Type:Book Section
Authors:Wu, Shaoqun and Fitzgerald, Alannah and Witten, Ian and Yu, Alex
Editors:Zou, Bin and Thomas, Michael
Date:23 February 2018
  • FLAX Language Project flax.nzdl.org
Digital Object Identifier (DOI):10.4018/978-1-5225-5140-9.ch025
Keywords:L2 academic writing, lexico-grammatical patterns, collocations, lexical bundles, abstracts, English for Academic Purposes (EAP)
ID Code:983413
Deposited On:23 Jan 2018 19:32
Last Modified:23 Jan 2018 19:39
Related URLs:


Ädel, A. (2006). Metadiscourse in L1 and L2 English. Amsterdam: John Benjamins Publishing Company. doi:10.1075/scl.24

Aktas, R. N., & Cortes, V. (2008). Shell nouns as cohesive devices in published and ESL student writing. Journal of English for Academic Purposes, 7(1), 3–14. doi:10.1016/j.jeap.2008.02.002

Atenas, J., Havemann, L., & Priego, E. (2015). Open data as open educational resources: Towards transversal skills and global citizenship. Open Praxis, 7(4), 377–389. doi:10.5944/openpraxis.7.4.233

Benson, M., Benson, E., & Ilsen, R. F. (1986). The BBI combinatory dictionary of English: A guide to word combinations. Amsterdam: John Benjamins. doi:10.1075/z.bbi1(1st)

Bernardini, S. (2002). Exploring new directions for discovery learning. In B. Kettemann & G. Marko (Eds.), Teaching and learning by doing corpus analysis (pp. 165–182). Amsterdam: Rodopi. doi:10.1163/9789004334236_015

Bhatia, V. K. (1993). Analysing genre: Language use in professional settings. London: Longman.

Biber, D., & Barbieri, F. (2007). Lexical bundles in university spoken and written registers. English for Specific Purposes, 26(3), 263–286. doi:10.1016/j.esp.2006.08.003

Biber, D., & Conrad, S. (1999). Lexical bundles in conversation and academic prose. In H. Hasselgard & S. Oksefjell (Eds.), Out of corpora: Studies in honor of Stig Johansson (pp. 181–189). Amsterdam: Rodopi.

Biber, D., Conrad, S., & Cortes, V. (2003). Lexical bundles in speech and writing: an initial taxonomy. In A. Wilson, P. Rayson, & T. McEnery (Eds.), Corpus linguistics by the lune: A festschrift for Geoffrey Leech (pp. 71–92). Frankfurt/Main: Peter Lang.

Biber, D., Conrad, S., & Cortes, V. (2004). If you look at . . .: Lexical bundles in university teaching and textbooks. Applied Linguistics, 25(3), 371–405. doi:10.1093/applin/25.3.371

Biber, D., Johansson, S., Leech, G., Conrad, S., & Finegan, E. (1999). The Longman grammar of spoken and written English. London: Longman.

Bondi, M., & Lores Sanz, R. (2014). Introduction. In M. Bondi & R. Lores Sanz (Eds.), Abstracts in Academic Discourse: Variation and Change (pp. 9–20). Bern: Peter Lang. doi:10.3726/978-3-0351-0701-2

Bordet, G. (2014). Influence of collocational variations on making the PhD abstract an effective “would be insider” self-promotional tool. In M. Bondi & R. Lores Sanz (Eds.), Abstracts in Academic Discourse: Variation and Change (pp. 131–160). Bern: Peter Lang.

Bordet, G. (2015). The role of “Lexical Paving” in building a text according to the requirements of a target genre. In English for Academic Purposes: Approaches and Implications (pp. 43-66). Cambridge Scholars Publishing.

Boulton, A., & Cobb, T. (2017). Corpus Use in Language Learning: A Meta-Analysis. Language Learning, 67(2), 348–393. doi:10.1111/lang.12224

Boulton, A., & Pérez-Paredes, P. (Eds.). (2014). Researching new uses of corpora for language teaching and learning. ReCALL, 26(2).

Boulton, A., & Thomas, J. (2012) Corpus language input, corpus processes in learning, learner corpus product. In Input, Process and Product: Developments in Teaching and Language Corpora. Brno: Masaryk University Press.

Chang, J.-Y. (2014). The use of general and specialized corpora as reference sources for academic English writing: A case study. ReCALL: the Journal of EUROCALL, 26(2), 243–259. doi:10.1017/S0958344014000056

Chen, Y. H., & Baker, P. (2010). Lexical bundles in L1 and L2 academic writing. Language Learning & Technology, 14(2), 30–49.

Cobb, T., & Boulton, A. (2015). Classroom applications of corpus analysis. In D. Biber & R. Reppen (Eds.), The Cambridge handbook of English corpus linguistics. Cambridge, UK: Cambridge University Press. doi:10.1017/CBO9781139764377.027

Cortes, V. (2004). Lexical bundles in published and student disciplinary writing: Examples from history and biology. English for Specific Purposes, 23(4), 397–423. doi:10.1016/j.esp.2003.12.001

Coxhead, A. (2000). A new academic word list. TESOL Quarterly, 34(2), 213–238. doi:10.2307/3587951

Coxhead, A., & Byrd, P. (2007). Preparing writing teachers to teach the vocabulary and grammar of academic prose. Second Language Writing, 16(3), 129–147. doi:10.1016/j.jslw.2007.07.002

Flowerdew, J. (2003). Signaling nouns in discourse. English for Specific Purposes, 22(4), 329–346. doi:10.1016/S0889-4906(02)00017-0

Francis, G. (1986). Anaphoric nouns. Birmingham, UK: English Language Research, University of Birmingham.

Gardner, D., & Davies, M. (2014). A new academic vocabulary list. Applied Linguistics, 35(3), 305–327. doi:10.1093/applin/amt015

Gaskell, D., & Cobb, T. (2004). Can learners use concordance feedback for writing errors? System, 32(3), 301–319. doi:10.1016/j.system.2004.04.001

Hackin, T. (2001). Abstracting from abstracts. In M. Hewings (Ed.), Academic writing in context: Implications and applications (pp. 93–103). Birmingham, UK: Birmingham University Press.

Hafner, C. A., & Candlin, C. N. (2007). Corpus tools as an affordance to learning in professional legal education. English for Academic Purposes, 6(4), 303–318. doi:10.1016/j.jeap.2007.09.005

Halliday, M. A. K., & Hasan, R. (1976). Cohesion in English. London: Longman.

Harwood, N. (2005). ‘Nowhere has anyone attempted . . . In this article I aim to do just that’: A corpus based study of self-promotional I and we in academic writing across four disciplines. Journal of Pragmatics, 37(8), 1207–1231. doi:10.1016/j.pragma.2005.01.012

Hill, J. (1999). Collocational competence. ETP, 11.

Hyland, K. (1999). Academic attribution: Citation and the construction of disciplinary knowledge. Applied Linguistics, 20(3), 341–367. doi:10.1093/applin/20.3.341

Hyland, K. (2000). Disciplinary discourses: Social interactions in academic writing. London: Longman.

Hyland, K. (2002). Authority and invisibility: Authorial identity in academic writing. Journal of Pragmatics, 34(8), 1091–1112. doi:10.1016/S0378-2166(02)00035-8

Hyland, K. (2008a). Academic clusters: Text patterning in published and postgraduate writing. International Journal of Applied Linguistics, 18(1), 41–62. doi:10.1111/j.1473-4192.2008.00178.x

Hyland, K. (2008b). As can be seen: Lexical bundles and disciplinary variation. English for Specific Purposes, 27(1), 4–21. doi:10.1016/j.esp.2007.06.001

Hyland, K., & Tse, P. (2005a). Evaluative that constructions: Signalling stance in research abstracts. Functions of Language, 12(1), 39–63. doi:10.1075/fol.12.1.03hyl

Hyland, K., & Tse, P. (2005b). Hooking the reader: a corpus study of evaluative that in abstracts. English for Specific Purposes, 24(2), 123-139.

Hyland, K., & Tse, P. (2007). Is there an “academic vocabulary”? TESOL Quarterly, 41(2), 235–253. doi:10.1002/j.1545-7249.2007.tb00058.x

Ivanič, R. (1991). Nouns in search of a context: A study of nouns with both open- and closed-system characteristics. International Journal of Applied Linguistics, 29, 93–114. doi:10.1515/iral.1991.29.2.93

Jiang, F., & Hyland, K. (2015). ‘The fact that’: Stance nouns in disciplinary writing. Discourse Studies, 1–22. doi:10.1177/1461445615590719

Johns, T. (1991). Should you be persuaded: Two examples of data-driven learning. English Language Research Journal, 4, 1–16.

Johns, T. (2002). Data-driven learning: the perpetual challenge. In B. Kettemann & G. Marko (Eds.), Teaching and Learning by Doing Corpus Analysis. Proceedings of the Fourth International Conference on Teaching and Language Corpora (pp. 107-117). Amsterdam: Rodopi.

Joint Information Systems Committee. (2011). JISC Grant funding 18/11: OER rapid innovation. Author.

Leńko-Szymańska, A., & Boulton, A. (2015). Multiple affordances of language corpora for data-driven learning. Amsterdam: John Benjamins Publishing Company. doi:10.1075/scl.69

Lock, S. (1988). Structured abstracts. BMJ: British Medical Journal, 297(6642).

Milne, D., & Witten, I. H. (2013). An open-source toolkit for mining Wikipedia. Artificial Intelligence, 194, 222–239. doi:10.1016/j.artint.2012.06.007

Nation, I. S. P. (2013). Learning vocabulary in another language (2nd ed.). Cambridge, UK: Cambridge University Press.

Nelson, M. (2006). Semantic association in Business English: A corpus-based analysis. English for Specific Purposes, 25(2), 217–234. doi:10.1016/j.esp.2005.02.008

Paquot, M. (2012). Academic vocabulary in learner writing: From extraction to analysis. London: Continuum.

Robinson, P. (1995). Attention, memory, and the “noticing” hypothesis. Language Learning, 45(2), 283–331. doi:10.1111/j.1467-1770.1995.tb00441.x

Schmidt, R. (2001). Attention. In P. Robinson (Ed.), Cognition and second language instruction (pp. 3–32). Cambridge, UK: Cambridge University Press. doi:10.1017/CBO9781139524780.003

Sonbul, S., & Schmitt, N. (2013). Explicit and implicit lexical knowledge: Acquisition of collocations under different input conditions. Language Learning, 63(1), 121–159. doi:10.1111/j.1467-9922.2012.00730.x

Swales, J. (1990). Genre analysis: English in academic and research settings. Cambridge, UK: Cambridge University Press.

Swales, J., & Feak, C. (2009). Abstracts and the writing of abstracts. The Michigan Series in English for Academic and Professional Purposes. Ann Arbor, MI: University of Michigan Press. doi:10.3998/mpub.309332

Szudarski, P., & Carter, R. (2016). The role of input flood and input enhancement in EFL learners’ acquisition of collocations. International Journal of Applied Linguistics, 26(2), 245–265. doi:10.1111/ijal.12092

Thomas, S., & Hawes, T. P. (1994). Reporting verbs in medical journal articles. English for Specific Purposes, 13(2), 129–148. doi:10.1016/0889-4906(94)90012-4

Thompson, G., & Ye, Y. (1991). Evaluation in the reporting verbs used in academic papers. Applied Linguistics, 12(4), 365–382. doi:10.1093/applin/12.4.365

Vyatkina, N. (2016). Data-driven learning of collocations: Learning performance, proficiency, and perceptions. Language Learning & Technology, 20(3), 159–179.

West, M. (1953). A general service list of English words. Longman, Green & Co.

Witten, I. H., Bainbridge, D., & Nichols, D. M. (2010). How to Build a Digital Library (2nd ed.). Burlington, MA: Morgan Kaufmann.

Wu, S., Li, L., Witten, I. H., & Yu, A. (2016). Constructing a collocation learning system from the
Wikipedia corpus. International Journal of Computer-Assisted Language Learning and Teaching, 6(3), 18–35. doi:10.4018/IJCALLT.2016070102

Wu, S., & Witten, I. H. (2016). Transcending concordance: Augmenting academic text for L2 writing. International Journal of Computer-Assisted Language Learning and Teaching, 6(2), 1–18. doi:10.4018/IJCALLT.2016040101

Xue, G., & Nation, P. (1984). A university word list. Language Learning and Communication, 3(2), 215–229.
All items in Spectrum are protected by copyright, with all rights reserved. The use of items is governed by Spectrum's terms of access.

Repository Staff Only: item control page

Downloads per month over past year

Research related to the current document (at the CORE website)
- Research related to the current document (at the CORE website)
Back to top Back to top