Tracing crosslinguistic influences in structural sequences: What does key structure analysis have to offer?


  • Ilmari Ivaska University of Bergen



Finnish as a second language, crosslinguistic influence, detection-based approach, key word analysis, key structure analysis


Following the detection-based approach, this article detects statistically significant frequency differences between the data of written Finnish by learners from various language backgrounds. It analyses crosslinguistic influences in a data-driven manner, as the analysis focuses on the morphological forms and their combinations (n-grams) that prove to be the best predictors of differing first languages. Following the methodology applied – key structure analysis – the article then goes on to analyse the found n-grams in terms of their inner and cotextual variation in order to find out which linguistic phenomenon actually distinguishes the subsets of data. The results show several quantitative differences that may be due to the crosslinguistic influences and they were all detected in a data-driven manner without hypotheses of potential differences. The method can be useful especially in finding and analysing elusive crosslinguistic influences that cannot be interpreted to be transferred directly from the respective first languages.


Aarts, J., and S. Granger. 1998. Tag sequences in learner corpora: A key to interlanguage grammar and discourse. In Learner English on computer, ed. S. Granger, 132-141. London: Longman.

Barlow, M. and S. Kemmer (Eds.) 2000. Usage based models of language. Chicago: CSLI Publications.

Breiman, L. 2001. Random forests. Machine Learning 45 (1): 5-32.

Cheng, W., C. Greaves, and M. Warren. 2006. From n-gram to skipgram to concgram. International Journal of Corpus Linguistics 11 (4): 411-433.

Common European framework for languages: Learning, teaching, assessment 2006. Cambridge: Cambridge University Press.

Estival, D., T. Gaustad, S. B. Pham, W. Radford, and B. Hutchinson. 2007. Author profiling for English emails. In Proceedings of the 10th conference of the Pacific Association for Computational Linguistics (PACLING 2007), Melbourne, Australia, 31-39.

Francis, G. 1993. A corpus-driven approach to grammar: Principles, methods and examples. In Text and technology. In honour of John Sinclair, eds. M. Baker, G. Francis, and E. Tognini-Bonelli, 137-156.

Amsterdam: John Benjamins.

Goldberg, A. 2006. Constructions at work: The nature of generalization in language. Oxford: Oxford University Press.

Granger, S. 1996. From CA to CIA and back: An integrated approach to computerized bilingual and learner corpora. In Languages in contrast, eds. K. Aijmer, B. Altenberg, and M. Johansson, 37-51. Lund: Lund University Press.

Granger, S. 1998. Prefabricated patterns in advanced EFL writing: Collocations and formulae. In Phraseology: Theory, analysis, and applications, ed. A. P. Cowie, 145-160. Oxford: Clarendon Press.

Granger, S. 2013. Contrastive Interlanguage Analysis: A Reappraisal. Keynote speech.

In Learner Corpus Research Conference 2013. Bergen/Os, Norway.

Granger, S., and M. Paquot. 2008. Disentangling the phraseological web. In Phraseology: An interdisciplinary perspective, eds. S. Granger and F. Meunier, 27-49. Amsterdam: John Benjamins.

Gries, S. Th. 2008. Phraseology and linguistic theory. In Phraseology: An interdisciplinary perspective, eds. S. Granger and F. Meunier, 3-25. Amsterdam: John Benjamins.

Guthrie, D., B. Allison, W. Liu, L. Guthrie, and Y. Wilks. 2006. A closer look at skip-gram modelling. In Proceedings of the fifth international conference on language resources and evaluation (LREC), Genoa, Italy, 1222-1225. (

Hothorn, T., P. Buehlmann, S. Dudoit, A. Molinaro, and M. Van Der Laan. 2006. Survival ensembles. Biostatistics 7 (3): 355-373.

Hunston, S. 2001. Colligation, lexis pattern, and text. In Patterns of Text: In honour of Michael Hoey, eds. M. Scott, and G. Thompson, 13-34. Amsterdam: John Benjamins.

Inaba, N. 2007. Mikael Agricolan teokset tietokannan muodossa. In Agricolan aika, eds. K. Häkkinen, and T. Vaittinen, 147-161. Helsinki: BTJ.

Itkonen, E. 2005. Analogy as structure and process. Amsterdam: John Benjamins.

Ivaska, I. 2012. Key structure analysis of formally defined structures of learner Finnish. Paper presented at the conference Learner Language, Learner Corpora, University of Oulu, 2012.

Ivaska, I. 2014a. Edistyneen oppijansuomen avainrakenteita. Korpusnäkökulma kahden kielimuodon tyypillisiin rakenteellisiin eroihin. Virittäjä 118:161-193.

Ivaska, I. 2014b. The corpus of advanced learner Finnish (LAS2). Database and toolkit to study academic learner Finnish. Apples: Journal of Applied Language Studies 8 (3): 21-38. (

Ivaska, I., and K. Siitonen 2009. Syntaktisesti koodattu oppijankielen korpus: mahdollisuuksia ja ongelmia. In Korpusuuringute metodoloogia ja märgendamise probleemid, eds. P. Eslon, and K. Õim, 54-71. Tallinn: Tallinna Ülikool.

Ivaska, I., and K. Siitonen 2011. Avainrakenneanalyysi. Tapa tutkia oppijankie¬len lauserakennetta korpusvetoisesti. AFinLA-e 3:35-47. (

Jantunen, J. H. 2004. Synonymia ja käännössuomi. Korpusnäkökulma samamerkityksisyyden kontekstuaalisuuteen ja käännöskielen leksikaalisiin erityispiirteisiin. Joensuu: Joensuun yliopistopaino.

Jantunen, J. H. 2009. “Minulla on aivan paljon rahaa”: Fraseologiset yksiköt suomen kielen opetuksessa. Virittäjä 113:356-381.

Jantunen, J. H. 2011. Avainsana-analyysi annotoidun oppijankieliaineiston tutkimisessa: Alustavia havaintoja. AFinLA-e 3:48-61.

Jarvis, S. 2000. Methodological rigor in the study of transfer: Identifying L1 influence in the interlanguage lexicon. Language Learning 50 (2): 245-309.

Jarvis, S. 2010. Comparison-based and detection-based approaches to transfer research. EUROSLA Yearbook 10:169-192.

Jarvis, S. 2011. Data mining with learner corpora: Choosing classifiers for L1 detection. In A taste for corpora. In honour of Sylviane Granger, eds. F. Meunier, S. De Cock, G. Gilquin, and M. Paquot, 131-158.

Amsterdam: John Benjamins.

Jarvis, S. 2012. The detection-based approach: An overview. In Approaching language transfer through text classification, eds. S. Jarvis, and S. A. Crossley, 1-33. Bristol: Multilingual Matters.

Jarvis, S., and S. A. Crossley, eds. 2012. Approaching language transfer through text classification. Bristol: Multilingual Matters.

Jarvis, S., and A. Pavlenko. 2008. Crosslinguistic influence in language and cognition. London: Routledge.

Jarvis, S., G. Castañeda-Jimenez, and R. Nielsen. 2012. Detecting L2 writers’ L1s on the basis of their lexical styles. In Approaching language transfer through text classification, eds. S. Jarvis, and S. A.

Crossley, 34-70. Bristol: Multilingual Matters

Koppel, M., J. Schler, and K. Zigdon. 2005. Automatically determining an anonymous author’s native language. In Proceedings of the eleventh ACM SIGKDD international conference on knowledge discovery in data mining, 624-628. Chicago: Association for Computing Machinery.

Lauseopin X-arkisto. n.d. School of Languages and Translation Studies of the University of Turku. Turku.(

Mayfield Tomokiyo, L., and R. Jones. 2001. You're not from ’round here, are you? Naive Bayes detection of non-native utterance text. In Proceedings of the second meeting of the North American chapter of the Association for Computational Linguistics (NAACL ’01). Cambridge, MA: Association for Computational Linguistics.

Meunier, F. and S. Granger, eds. 2008. Phraseology in foreign language learning and teaching. Amsterdam: John Benjamins.

Nesselhauf, N. 2004. Collocations in a learner corpus. Philadelphia: John Benjamins.

Odlin, T. 1989. Language transfer: Cross-linguistic influence in language learning. Cambridge: Cambridge University Press.

Penttilä, R. 2008. Suomen ja-konjunktion vastineet ir, o ja pilkku liettuassa. Master’s thesis, University of Turku.

Pepper, S. 2012. Lexical transfer in Norwegian interlanguage: A detection-based approach. Master’s thesis, University of Oslo.

R Core Team 2013. R: A language and environment for statistical computing. Vienna, Austria. (

Räisänen, J. 2005. Suomen tempusten semantiikka tšekin- ja venäjänkielisten suomenoppijoiden välikielissä. Master’s thesis, University of Turku.

Scott, M. 2010. Problems in investigating keyness, or clearing the undergrowth and marking out trails... In Keyness in texts, eds. M. Bondi and M. Scott, 43-57. Amsterdam: John Benjamins.

Scott, M. and C. Tribble. 2006. Textual patterns. Key words and corpus analysis in language education. Amsterdam: John Benjamins.

Sinclair, J. 1991. Corpus, concordance, collocation. Oxford: Oxford University Press.

Sinclair, J. 2001. Reviews of the Longman grammar of spoken and written English. International Journal of Corpus Linguistics 6 (2): 339–359.

Skousen, R. 1989. Analogical modeling of language. Dodrecht: Kluwer Academic Publishers.

Stanford University. Natural Language Processing. n.d. Statistical natural language processing and corpus-based computational linguistics: An annotated list of resources.(

Stefanowitsch, A., and S. Th. Gries. 2003. Collostructions: Investigating the interaction of words and constructions. International Journal of Corpus Linguistics 8 (2): 209-243.

Strobl, C., A.-L. Boulesteix, A. Zeileis, and T. Hothorn. 2007. Bias in random forest variable importance measures: Illustrations, sources and a solution. BMC Bioinformatics 8 (25). (

Strobl, C., A.-L. Boulesteix, T. Kneib, T. Augustin, and A. Zeileis. 2008. Conditional Variable Importance for Random Forests. BMC Bioinformatics 9 (307).

Strobl, C., J. Malley, and G. Tutz. 2009. An introduction to recursive partitioning: rationale, application, and characteristics of classification and regression trees, bagging, and random forests. Psychological methods 14 (4): 323-348.

Tagliamonte, S., and R. H. Baayen. 2012. Models, forests and trees of York English: Was/were variation as a case study for statistical practice. Language Variation and Change 24 (2): 135-178.

Université catholique de Louvain. Centre for English Corpus Linguistics. n.d. Learner corpora around the world.

Wiersma, W., J. Nerbonne, and T. Lauttamus. 2011. Automatically extracting typical syntactic differences from corpora. Literary and Linguistic Computing 26 (1): 107-124.

Wong, S.-M. J., and M. Dras. 2009. Contrastive analysis and native language identification. In Proceedings of the Australasian Language Technology Association, 53-61. Cambridge: MA: Association




How to Cite

Ivaska, Ilmari. 2015. “Tracing Crosslinguistic Influences in Structural Sequences: What Does Key Structure Analysis Have to Offer?”. Bergen Language and Linguistics Studies 6 (May).