CORPUS TOOLS AND SOFTWARE: OVERVIEW AND APPLICATIONS
Abstract
Corpus tools and software have become essential for linguistic research, language pedagogy, and applied computational tasks. This article provides an overview of commonly used corpus tools (concordancers, collocation analyzers, frequency and KWIC utilities, sketch engines) and their primary applications in research and teaching. It discusses workflow stages — corpus creation, cleaning, annotation, searching, and visualization — and maps software tools to those stages. The review highlights both desktop (e.g., AntConc, WordSmith) and web/cloud-based systems (e.g., Sketch Engine, Voyant), plus integrated suites (e.g., LancsBox) and programming libraries (e.g., Python’s NLTK, spaCy). Practical applications covered include error analysis, phraseology/idiom extraction, curriculum design, automated assessment, and digital humanities projects. The article also considers methodological issues (tokenization, tagging accuracy, metadata design), usability (GUI vs CLI), and ethical/legal constraints (data protection, licensing). A simple comparative table (Table 1) is included to help readers choose appropriate tools for specific research or pedagogical goals. Recommendations emphasize matching tool capabilities to research questions, documenting preprocessing steps, and combining multiple tools for complementary strengths. The paper closes with pedagogical implications and directions for future tool development, stressing interoperability, reproducibility, and improved support for under-resourced languages.
References
Granger, S. (1998). The computer learner corpus: a new resource for foreign language learning and teaching. Applied Linguistics, 19(4), 529–553.
Dagneaux, E., Granger, S., & Meunier, F. (1998). A learner corpus approach to error analysis. International Journal of Corpus Linguistics, 3(2), 161–182.
Nesselhauf, N. (2005). Collocations in a Learner Corpus. John Benjamins.
Boulton, A., & Cobb, T. (2017). Corpus use in language learning: A meta-analysis. Language Learning, 67(2), 348–393.
Granger, S., Gilquin, G., & Meunier, F. (Eds.). (2015). The Cambridge Handbook of Learner Corpus Research. Cambridge University Press.
Anthony, L. (2019). AntConc: A freeware corpus analysis toolkit (Version 4.x) [Computer software]. Waseda University.
Scott, M. (1996). WordSmith Tools (Version 4) [Computer software]. Oxford University Press.
Kilgarriff, A., Baisa, V., Bušta, J., Jakubíček, M., Kovář, V., Michelfeit, J., … & Suchomel, V. (2014). The Sketch Engine: Ten years on. Lexicography, 1(1), 7–36.

