Patterns of misspellings in L2 and L1 English: a view from the ETS Spelling Corpus

Authors

  • Michael Flor University of Bergen
  • Yoko Futagi
  • Melissa Lopez
  • Matthew Mulholland

DOI:

https://doi.org/10.15845/bells.v6i0.811

Keywords:

misspellings, learner corpus, annotation, writing proficiency, word length, word frequency

Abstract

This paper presents a study of misspellings, based on annotated data from the ETS Spelling corpus. The corpus consists of 3000 essays written by examinees, native (NS) and non-native speakers (NNS) of English, on the writing sections of GRE® and TOEFL® examinations. We find that the rate of misspellings decreases as writing proficiency (essay score) increases, both in TOEFL and in GRE. Severity of misspellings depends on writing proficiency and not on NS/NNS distinction. Word-length and word-frequency have strong influences on production of misspellings, showing patterns associated with proficiency. For word-frequency, there is also a clear effect of NS/NNS distinction.

References

Bebout, L. 1985. An error analysis of misspellings made by learners of English as a first and as a second language. Journal of Psycholinguistic Research 14 (6): 569–593.

Bestgen, Y., and S. Granger. 2011. Categorising spelling errors to assess L2 writing. International Journal of Continued Engineering Education and Life-Long Learning 21 (2/3): 235-252.

Botley, S., and D. Dillah. 2007. Investigating spelling errors in a Malaysian learner corpus. Malaysian Journal of ELT Research 3:74–93.

Carroll, J.B., P. Davies, and B. Richman. 1971. The American Heritage word frequency book. New York: American Heritage Publishing Co.

Chodorow, M., and J. Burstein. 2004. Beyond essay length: Evaluating e-rater's performance on TOEFL essays. TOEFL Research Report No. RR-73, ETS RR-04-04. Princeton, NJ: ETS.

Cook, V. J. 1997. L2 user and English spelling. Journal of Multilingual and Multicultural Development 18 (6): 474-488.

Damerau, F. 1964. A technique for computer detection and correction of spelling errors, Communications of the ACM 7 (3): 659-664.

ETS, 2011a. GRE®: Introduction to the Analytical Writing Measure. Available from (www.ets.org/gre/revised_general/prepare/analytical_writing)

ETS, 2011b. TOEFL® iBT® Test Content. www.ets.org/toefl/ibt/about/content

Flor, M. 2012. Four types of context for automatic spelling correction. Traitement Automatique des Langues (TAL), 53 (3): 61-99. (http://www.atala.org/IMG/pdf/Flor-TAL53-3.pdf )

Flor, M., and Y. Futagi. 2013. Producing an annotated corpus with automatic spelling correction. In Twenty Years of Learner Corpus Research: Looking back, Moving ahead. Corpora and Language in Use, eds. S. Granger, G. Gilquin and F. Meunier, 139-154. Louvain-la-Neuve: Presses universitaires de Louvain.

Flor, M., and Y. Futagi. 2012. On using context for automatic correction of non-word misspellings in student essays. In Proceedings of The 7th Workshop on Innovative Use of NLP for Building Educational Applications (BEA), 105-115, at the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL HLT), June 3-8, 2012, Montréal, Canada. (http://aclweb.org/anthology-new/W/W12/W12-2012.pdf )

Graff, D., and C. Cieri. 2003. English GigaWord 2003. Philadelphia, PA: Linguistic Data Consortium. (http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2003T05)

Granger S., and M. Wynne. 1999. Optimising measures of lexical variation in EFL learner corpora. In Corpora Galore, ed. J. Kirk, 249–257. Amsterdam: Rodopi.

Hovermale, D. J. 2010. An analysis of the spelling errors of L2 English learners. Presented at the CALICO 2010 Conference, Amherst, MA, USA, June 10-12, 2010.

(http://www.ling.ohio-state.edu/~djh/presentations/djh_CALICO2010.pptx)

Kukich, K. 1992. Techniques for automatically correcting words in text. ACM Computing Surveys 24 (4): 377-439.

Leacock, C., and M. Chodorow. 2003. C-rater: Automated Scoring of Short-answer Questions. Computers and Humanities 37 (4): 389-405.

Levenshtein, L. 1966. Binary codes capable of correcting deletions, insertions and reversals. Soviet Physics Doklady 10:707-710.

Lunsford, A. A., and K.J. Lunsford. 2008. Mistakes Are a Fact of Life: A National Comparative Study. College Composition and Communication 59 (4): 781-806.

Mitton, R., and T. Okada. 2007. The adaptation of an English spellchecker for Japanese writers. Paper presented at the Symposium on Second Language Writing, 15-17 Sept. 2007, Nagoya, Japan. Available from (http://eprints.bbk.ac.uk/592)

Page, E. B. 1967. The imminence of grading essays by computer. Phi Delta Kappan 47 (5): 238-243.

Pollock, J., and A. Zamora. 1984. Automatic spelling correction in scientific and scholarly text. Communications of the ACM 27 (4): 358-368.

Ramineni C., C.S. Trapani, D.M. Williamson, T. Davey, and B. Bridgeman. 2012a. Evaluation of the e-rater® Scoring Engine for the GRE® Issue and Argument Prompts. Research Report RR-12-02, Princeton, NJ: Educational Testing Service. (http://www.ets.org/research/policy_research_reports/rr-12-02)

Ramineni C., C.S. Trapani, D.M. Williamson, T. Davey, and B. Bridgeman. 2012b. Evaluation of the

e-rater® Scoring Engine for the TOEFL® Independent and Integrated Prompts. Research Report RR-12-06, Princeton, NJ: Educational Testing Service.

(http://www.ets.org/research/policy_research_reports/rr-12-06)

Rimrott, A., and T. Heift. 2008. Evaluating automatic detection of misspellings in German. Language Learning and Technology 12 (3): 73–92.

Sukkarieh, J. Z., and J. Blackmore. 2009. C-rater: Automatic Content Scoring for Short Constructed Responses. In Proceedings of the 22nd International Florida Artificial Intelligence Research Society Conference, 290-295, Menlo Park, CA: AAAI Press.

Turba, T. N. 1981. Checking for spelling and typographical errors in computer-based text. ACM SIGPLAN Notices 16 (6): 51-60.

Downloads

Published

2015-05-30

How to Cite

Flor, Michael, Yoko Futagi, Melissa Lopez, and Matthew Mulholland. 2015. “Patterns of Misspellings in L2 and L1 English: A View from the ETS Spelling Corpus”. Bergen Language and Linguistics Studies 6 (May). https://doi.org/10.15845/bells.v6i0.811.