Login | Register

Demographic and Geographic Drivers of Scientific Trends: Covariate Effects in Canadian NSERC Proposals

Title:

Demographic and Geographic Drivers of Scientific Trends: Covariate Effects in Canadian NSERC Proposals

Tavakoli Kafiabad, Shirin (2025) Demographic and Geographic Drivers of Scientific Trends: Covariate Effects in Canadian NSERC Proposals. Masters thesis, Concordia University.

[thumbnail of TavakoliKafiabad_MA__F2025.pdf]
Preview
Text (application/pdf)
TavakoliKafiabad_MA__F2025.pdf - Accepted Version
Available under License Spectrum Terms of Access.
1MB

Abstract

Optimizing national scientific investment requires a clear understanding of evolving research trends and the demographic and geographical forces shaping them, particularly in light of commitments to equity, diversity, and inclusion. This thesis investigates how researcher gender and provincial location influence the prevalence and evolution of research topics over 18 years (2005–2022) of proposals funded by the Natural Sciences and Engineering Research Council of Canada (NSERC). To address this objective, we conducted a comprehensive comparative evaluation of three topic modeling approaches: Latent Dirichlet Allocation (LDA), Structural Topic Modeling (STM), and transformer-based BERTopic.
A key innovation is the COFFEE pipeline, a novel Python tool that enables robust covariate effect estimation for BERTopic. This advancement addresses a significant gap, as BERTopic lacks a native function for covariate analysis, unlike the probabilistic STM. Our findings highlight that while all models effectively delineate core scientific domains, BERTopic outperformed by consistently identifying more granular, coherent, and emergent themes, such as the rapid expansion of artificial intelligence. Additionally, the covariate analysis, powered by COFFEE, confirmed distinct provincial research specializations and revealed consistent gender-based thematic patterns across various scientific disciplines. These insights offer a robust empirical foundation for funding organizations to formulate more equitable and impactful funding strategies, thereby enhancing the effectiveness of the scientific ecosystem.

Divisions:Concordia University > Gina Cody School of Engineering and Computer Science > Concordia Institute for Information Systems Engineering
Item Type:Thesis (Masters)
Authors:Tavakoli Kafiabad, Shirin
Institution:Concordia University
Degree Name:M.A. Sc.
Program:Quality Systems Engineering
Date:14 October 2025
Thesis Supervisor(s):Schiffauerova, Andrea
Keywords:Research Investment, Research Trends, Large Language Models, Natural Language Processing, Topic Modelling, COFFEE Algorithm
ID Code:996396
Deposited By: Shirin Tavakoli Kafiabad
Deposited On:29 Jun 2026 14:52
Last Modified:29 Jun 2026 14:52

References:

Abramo, G., D’Angelo, C. A., & Murgia, G. (2013). Gender differences in research collaboration.
Journal of Informetrics, 7(4), 811–822.
Achiam, J., Adler, S., Agarwal, S., Ahmad, L., Akkaya, I., Aleman, F. L., . . . others (2024). Gpt-4
technical report. Retrieved from https://arxiv.org/abs/2303.08774
Arnaout, A., Gill, P., Virani, A., Flatt, A., Prodan-Balla, N., Byres, D., . . . Virani, S. (2024). Shaping
the future of healthcare in british columbia: Establishing provincial clinical governance for
responsible deployment of artificial intelligence tools. Healthcare Management Forum, 37(5),
320–328. Retrieved from https://journals.sagepub.com/home/hmf doi: 10
.1177/08404704241264819
Asheim, B., Grillitsch, M., & Trippl, M. (2016). Regional innovation systems: past - presence - future.
In D. Doloreux, R. Shearmur, & C. Carrincazeaux (Eds.), Handbook on the geographies
of innovation (pp. 45–62). Edward Elgar Publishing. doi: 10.4337/9781784710774.00010
Blei, D. M., & Lafferty, J. D. (2006). Dynamic topic models. In Proceedings of the 23rd
international conference on machine learning (p. 113–120). New York, NY, USA: Association
for Computing Machinery. Retrieved from https://doi.org/10.1145/
1143844.1143859 doi: 10.1145/1143844.1143859
Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. Journal of machine
Learning research, 3(Jan), 993–1022.
Bornmann, L., Mutz, R., & Daniel, H.-D. (2007a). Gender differences in grant peer review: A
meta-analysis. Journal of Informetrics, 1(3), 226–238.
Bornmann, L., Mutz, R., & Daniel, H.-D. (2007b). Gender differences in grant peer review: A meta-analysis. Journal of Informetrics, 1(3), 226–238.
Boschma, R. (2005). Proximity and innovation: a critical assessment. Regional studies, 39(1),
61–74. doi: 10.1080/0034340052000320887
Breschi, S., Lenzi, C., Lissoni, F., & Vezzulli, A. (2010). 16 the geography of knowledge spillovers:
the role of inventors’ mobility across firms and in space. The handbook of evolutionary economic
geography, 353. Retrieved from https://doi.org/10.1093/jeg/lbp049
doi: 10.1093/jeg/lbp049
Burgelman, J.-C., Chloupkov´a, J., & Wobbe, W. (2014). Foresight in support of european research
and innovation policies: The european commission is preparing the funding of grand societal
challenges. European Journal of Futures Research, 2(1), 55.
Burney, S., Donelle, L., & Kothari, A. (2025). Exploring the public health agency of canada’s
and the ontario government’s vaccine-related crisis communication on x during the covid-19
pandemic. FACETS, 10, 1–16. Retrieved from https://doi.org/10.1139/facets
-2022-0186 doi: 10.1139/facets-2022-0186
Canadian Nurses Association. (2023). Nursing statistics. Retrieved from https://
www.cna-aiic.ca/en/nursing/regulated-nursing-in-canada/
nursing-statistics (Accessed: 2025-07-10)
Canadian Society for Molecular Biosciences. (2024). 2024 Annual Conference – Canadian Society
for Molecular Biosciences. https://www.csmb-scbm.ca/meetings/2024
-annual-conference/. (Accessed)
Chen, X., Xie, H., Tao, X., Xu, L., Wang, J., Dai, H.-N., & Wang, F. L. (2024). A topic modelingbased
bibliometric exploration of automatic summarization research. Wiley Interdisciplinary
Reviews: Data Mining and Knowledge Discovery, 14(5), e1540.
Deanna, R., Merkle, B. G., Chun, K. P., Navarro-Rosenblatt, D., Baxter, I., Oleas, N., . . . others
(2022). Community voices: the importance of diverse networks in academic mentoring.
Nature Communications, 13(1), 1681.
Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T. K., & Harshman, R. (1990). Indexing
by latent semantic analysis. Journal of the American society for information science, 41(6),
391–407.
Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019, June). BERT: Pre-training of
deep bidirectional transformers for language understanding. In J. Burstein, C. Doran, &
T. Solorio (Eds.), Proceedings of the 2019 conference of the north American chapter of the
association for computational linguistics: Human language technologies, volume 1 (long
and short papers) (pp. 4171–4186). Minneapolis, Minnesota: Association for Computational
Linguistics. Retrieved from https://aclanthology.org/N19-1423/ doi:
10.18653/v1/N19-1423
Dub´e, M. G., Dunlop, J. M., Davidson, C., Beausoleil, D. L., Hazewinkel, R. R., &Wyatt, F. (2021).
History, overview, and governance of environmental monitoring in the oil sands region of
alberta, canada. Integrated Environmental Assessment and Management, 18(2), 319–332.
doi: 10.1002/ieam.4490
Ebadi, A., Tremblay, S., Goutte, C., & Schiffauerova, A. (2020). Application of machine learning
techniques to assess the trends and alignment of the funded research output. Journal of
Informetrics, 14(2), 101018.
Ebadi, A., Zahedi, M. R., Jowkar, M., & Zare, A. (2016). How to boost scientific production? a
statistical analysis of research funding and other influencing factors. Scientometrics, 106(3),
1117–1135.
Ecklund, E. H., Lincoln, A. E., & Tansey, C. (2012). Gender segregation in elite academic science.
Gender & Society, 26(5), 693–717.
Efron, B., & Narasimhan, B. (2020). The automatic construction of bootstrap confidence intervals.
Journal of Computational and Graphical Statistics, 29(3), 608–619. Retrieved from
https://doi.org/10.1080/10618600.2020.1714633 (PMID: 33727780) doi:
10.1080/10618600.2020.1714633
Egger, R., & Yu, J. (2022, may 6). A topic modeling comparison between lda, nmf, top2vec,
and bertopic to demystify twitter posts. Frontiers in Sociology, 7, 886498. doi: 10.3389/
fsoc.2022.886498
Florida, R. (2002). The economic geography of talent. Annals of the Association of American
geographers, 92(4), 743–755. doi: 10.1111/1467-8306.00325
Glenny, V., Tuke, J., Bean, N., & Mitchell, L. (2019). A framework for streamlined statistical prediction
using topic models. Retrieved from https://arxiv.org/abs/1904.06941
Gray, T. S. (Ed.). (2005). Participation in fisheries governance (Vol. 4). Dordrecht, The Netherlands:
Springer. (Jennifer L. Nielsen is the Series Editor [1])
Griffiths, T., Jordan, M., Tenenbaum, J., & Blei, D. (2003). Hierarchical topic models and the
nested chinese restaurant process. In (Vol. 16).
Grimmer, J., & Stewart, B. M. (2013). Text as data: The promise and pitfalls of automatic content
analysis methods for political texts. Political analysis, 21(3), 267–297.
Grootendorst, M. (2020). Bertopic: Leveraging bert and c-tf-idf to create easily interpretable topics.
Zenodo, Version v0, 9(10.5281).
Hajibabaei, A., Schiffauerova, A., & Ebadi, A. (2022). Gender-specific patterns in the artificial
intelligence scientific ecosystem. Journal of Informetrics, 16, 101275.
Hajibabaei, A., Schiffauerova, A.,&Ebadi, A. (2023).Women and key positions in scientific collaboration
networks: analyzing central scientists’ profiles in the artificial intelligence ecosystem
through a gender lens. Scientometrics, 128, 1219-1240. doi: 10.1007/s11192-022-04601-5
Hango, D. (2013, december). Gender differences in science, technology, engineering,
mathematics and computer science (stem) programs at university. , 1-11. Retrieved
from https://www150.statcan.gc.ca/n1/pub/75-006-x/2013001/
article/11874-eng.pdf
Hankar, M., Kasri, M., & Beni-Hssane, A. (2025). A comprehensive overview of topic modeling:
Techniques, applications and challenges. Neurocomputing, 129638.
Healy, J., & McInnes, L. (2024). Uniform manifold approximation and projection. Nature Reviews
Methods Primers, 4(1), 82.
Jacob, B. A., & Lefgren, L. (2011). The impact of research grant funding on scientific productivity.
Journal of Public Economics, 95(9-10), 1168–1177.
Jaramillo, A. M., Macedo, M., Oliveira, M., Karimi, F., & Menezes, R. (2025). Systematic
comparison of gender inequality in scientific rankings across disciplines. Retrieved from
https://arxiv.org/abs/2501.13061
Lang, R., Benham, J. L., Atabati, O., & et al. (2021). Attitudes, behaviours and barriers to public
health measures for covid-19: a survey to inform public health messaging. BMC Public
Health, 21(1), 765. Retrieved from https://doi.org/10.1186/s12889-021
-10790-0 doi: 10.1186/s12889-021-10790-0
Larivi`ere, V., Ni, C., Gingras, Y., Cronin, B., & Sugimoto, C. R. (2013). Bibliometrics: Global
gender disparities in science. Nature, 504(7479), 211–213.
Maaten, L. v. d., & Hinton, G. (2008). Visualizing data using t-sne. Journal of machine learning
research, 9(Nov), 2579–2605.
Macnaghten, P. (2022). Models of science policy: from the linear model to responsible research and
innovation. In The responsibility of science (pp. 93–106). Springer International Publishing
Cham.
McCann, P. (2001). Urban and regional economics. Oxford University Press.
McInnes, L., Healy, J., & Astels, S. (2017). hdbscan: Hierarchical density based clustering. Journal
of Open Source Software, 2(11), 205. doi: 10.21105/joss.00205
Merton, R. K. (1968). The matthew effect in science: The reward and communication systems
of science are considered. Science, 159(3810), 56–63. Retrieved from https://
www.science.org/doi/abs/10.1126/science.159.3810.56 doi: 10.1126/
science.159.3810.56
Mimno, D., Wallach, H., Talley, E., Leenders, M., & McCallum, A. (2011, July). Optimizing
semantic coherence in topic models. In R. Barzilay & M. Johnson (Eds.), Proceedings
of the 2011 conference on empirical methods in natural language processing (pp. 262–
272). Edinburgh, Scotland, UK.: Association for Computational Linguistics. Retrieved from
https://aclanthology.org/D11-1024/
Mooney, C. Z. (1996). Bootstrap statistical inference: Examples and evaluations for political
science. American Journal of Political Science, 40(2), 570–602.
Natural Resources Canada. (2024). About canmetmaterials. Natural Resources Canada
website, Government of Canada. Retrieved from https://natural-resources
.canada.ca/science-data/science-research/research-centres/
canmetmaterials (Date Modified: 2024-12-20)
Natural Sciences and Engineering Research Council of Canada. (2022, September 21).
Nserc 2030: Discovery. innovation. inclusion. Online. Retrieved from https://
www.nserc-crsng.gc.ca/nserc-crsng/nserc2030-crsng2030/
report-rapport/index eng.asp (Accessed 2025-07-20)
Ogden, L. E. (2019, nov). Study finds gender differences in success rates for canadian scientific research
grants. University Affairs. Retrieved from https://universityaffairs.ca/
news/study-finds-gender-differences-in-success-rates-for
-canadian-scientific-research-grants/ (Accessed: October 24, 2025)
Omenn, G. S. (2006). Grand challenges and great opportunities in science, technology, and public
policy. Science, 314(5806), 1696–1704.
Petersen, O. H. (2021). Inequality of research funding between different countries and regions is
a serious problem for global science. Function (Oxf.), 2(6), zqab060. doi: 10.1093/function/
zqab060
Porter, M. E. (2008). On competition. Harvard Business Press.
Reimers, N., & Gurevych, I. (2019). Sentence-bert: Sentence embeddings using siamese bertnetworks.
arXiv preprint arXiv:1908.10084.
Roberts, M. E., Stewart, B. M., & Tingley, D. (2019). stm: An r package for structural topic models.
Journal of Statistical Software, 91(2), 1–40. doi: 10.18637/jss.v091.i02
Roberts, M. E., Stewart, B. M., Tingley, D., & Airoldi, E. M. (2013). The structural topic model and
applied social science. In Proceedings of the nips 2013 workshop on topic models: Computation,
application, and evaluation. Retrieved from https://scholar.harvard.edu/
files/dtingley/files/stmnips2013.pdf (Prepared for the NIPS 2013 Workshop
on Topic Models: Computation, Application, and Evaluation)
R¨oder, M., Both, A., & Hinneburg, A. (2015). Exploring the space of topic coherence measures.
In Proceedings of the eighth acm international conference on web search and data mining
(p. 399–408). New York, NY, USA: Association for Computing Machinery. Retrieved from
https://doi.org/10.1145/2684822.2685324 doi: 10.1145/2684822.2685324
Rodr´ıguez-Pose, A. (2018). The revenge of the places that don’t matter (and what to do about it).
Cambridge Journal of Regions, Economy and Society, 11(1), 189–209.
Rosen-Zvi, M., Griffiths, T., Steyvers, M., & Smyth, P. (2012). The author-topic model for authors
and documents..
Sato, S., Gygax, P. M., Randall, J., & Schmid Mast, M. (2021). The leaky pipeline in research
grant peer review and funding decisions: challenges and future directions. Higher Education,
82(1), 145–162. Retrieved from https://doi.org/10.1007/s10734-020-00626
-y doi: 10.1007/s10734-020-00626-y
Schmader, T., Whitehead, J., & Wysocki, V. H. (2007). A Linguistic Comparison of Letters of
Recommendation for Male and Female Chemistry and Biochemistry Job Applicants. Sex
Roles, 57(7), 509–514. Retrieved from https://doi.org/10.1007/s11199-007
-9291-4 doi: 10.1007/s11199-007-9291-4
Schmaling, K. B., & Gallo, S. A. (2023). Gender differences in peer reviewed grant applications,
awards, and amounts: a systematic review and meta-analysis. Research integrity and peer
review, 8(1), 2.
Schulze, P., Wiegrebe, S., Thurner, P. W., Heumann, C., & Aßenmacher, M. (2024). A bayesian
approach to modeling topic-metadata relationships (Vol. 108) (No. 2). Springer.
Smith, R. D., Sch¨afer, S., & Bernstein, M. J. (2024). Governing beyond the project: Refocusing innovation
governance in emerging science and technology funding. Social Studies of Science,
54(3), 377–404.
Stephan, P. E. (2015). How economics shapes science. Harvard University Press.
Tibshirani, R. J., & Efron, B. (1993). An introduction to the bootstrap (Vol. 57) (No. 1). Retrieved
from https://www.taylorfrancis.com/books/9781000064988
Torres, I. L., Collins, R.-N., Hertz, A., & Liukkonen, M. (2024). Policy proposals to promote
inclusion of caregivers in the research funding system. Frontiers in Education, Volume 9 -
2024. Retrieved from https://www.frontiersin.org/journals/education/
articles/10.3389/feduc.2024.1472517 doi: 10.3389/feduc.2024.1472517
Tri-agency equity, diversity, and inclusion action plan. (2021). Retrieved from https://www
.nserc-crsng.gc.ca/NSERC-CRSNG/EDI-EDI/index eng.asp
Van Arensbergen, P., Van der Weijden, I., & Van den Besselaar, P. (2012). Gender differences in
scientific productivity: a persisting phenomenon? Scientometrics, 93(3), 857–868.
van den Besselaar, P., & Mom, C. (2022). Gender differences in research grant allocation–a mixed
picture. arXiv preprint arXiv:2205.13641, 126(4), 3191–3215.
Vayansky, I., & Kumar, S. A. (2020). A review of topic modeling methods. Information Systems,
94, 101582.
Vector Institute for Artificial Intelligence. (2025). Vector Institute for Artificial Intelligence.
https://vectorinstitute.ai/. (Accessed: October 25, 2025)
Wenner˚as, C., & Wold, A. (1997). Nepotism and sexism in peer review. Nature, 387(6631),
341–343.
Wilson, C. (2022). Public engagement and ai: A values analysis of national strategies. Government
Information Quarterly, 39(1), 101652. doi: 10.1016/j.giq.2024.101929
Witteman, H. O., Hendricks, M., Straus, S., & Tannenbaum, C. (2019). Are gender gaps due to
evaluations of the applicant or the science? a natural experiment at a national funding agency.
The Lancet, 393(10171), 531–540.
Zhou, P., Cai, X., & Lyu, X. (2020). An in-depth analysis of government funding and international
collaboration in scientific research. Scientometrics, 125(2), 1331–1347.
All items in Spectrum are protected by copyright, with all rights reserved. The use of items is governed by Spectrum's terms of access.

Repository Staff Only: item control page

Downloads per month over past year

Research related to the current document (at the CORE website)
- Research related to the current document (at the CORE website)
Back to top Back to top