Patterns of misspellings in L2 and L1 English: a view from the ETS Spelling Corpus
PDF

Keywords

misspellings
learner corpus
annotation
writing proficiency
word length
word frequency

How to Cite

Flor, Michael, Yoko Futagi, Melissa Lopez, and Matthew Mulholland. 2015. “Patterns of Misspellings in L2 and L1 English: A View from the ETS Spelling Corpus”. Bergen Language and Linguistics Studies 6 (May). https://doi.org/10.15845/bells.v6i0.811.

Abstract

This paper presents a study of misspellings, based on annotated data from the ETS Spelling corpus. The corpus consists of 3000 essays written by examinees, native (NS) and non-native speakers (NNS) of English, on the writing sections of GRE® and TOEFL® examinations. We find that the rate of misspellings decreases as writing proficiency (essay score) increases, both in TOEFL and in GRE. Severity of misspellings depends on writing proficiency and not on NS/NNS distinction. Word-length and word-frequency have strong influences on production of misspellings, showing patterns associated with proficiency. For word-frequency, there is also a clear effect of NS/NNS distinction.
https://doi.org/10.15845/bells.v6i0.811
PDF

References

Bebout, L. 1985. An error analysis of misspellings made by learners of English as a first and as a second language. Journal of Psycholinguistic Research 14 (6): 569–593.

Bestgen, Y., and S. Granger. 2011. Categorising spelling errors to assess L2 writing. International Journal of Continued Engineering Education and Life-Long Learning 21 (2/3): 235-252.

Botley, S., and D. Dillah. 2007. Investigating spelling errors in a Malaysian learner corpus. Malaysian Journal of ELT Research 3:74–93.

Carroll, J.B., P. Davies, and B. Richman. 1971. The American Heritage word frequency book. New York: American Heritage Publishing Co.

Chodorow, M., and J. Burstein. 2004. Beyond essay length: Evaluating e-rater's performance on TOEFL essays. TOEFL Research Report No. RR-73, ETS RR-04-04. Princeton, NJ: ETS.

Cook, V. J. 1997. L2 user and English spelling. Journal of Multilingual and Multicultural Development 18 (6): 474-488.

Damerau, F. 1964. A technique for computer detection and correction of spelling errors, Communications of the ACM 7 (3): 659-664.

ETS, 2011a. GRE®: Introduction to the Analytical Writing Measure. Available from (www.ets.org/gre/revised_general/prepare/analytical_writing)

ETS, 2011b. TOEFL® iBT® Test Content. www.ets.org/toefl/ibt/about/content

Flor, M. 2012. Four types of context for automatic spelling correction. Traitement Automatique des Langues (TAL), 53 (3): 61-99. (http://www.atala.org/IMG/pdf/Flor-TAL53-3.pdf )

Flor, M., and Y. Futagi. 2013. Producing an annotated corpus with automatic spelling correction. In Twenty Years of Learner Corpus Research: Looking back, Moving ahead. Corpora and Language in Use, eds. S. Granger, G. Gilquin and F. Meunier, 139-154. Louvain-la-Neuve: Presses universitaires de Louvain.

Flor, M., and Y. Futagi. 2012. On using context for automatic correction of non-word misspellings in student essays. In Proceedings of The 7th Workshop on Innovative Use of NLP for Building Educational Applications (BEA), 105-115, at the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL HLT), June 3-8, 2012, Montréal, Canada. (http://aclweb.org/anthology-new/W/W12/W12-2012.pdf )

Graff, D., and C. Cieri. 2003. English GigaWord 2003. Philadelphia, PA: Linguistic Data Consortium. (http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2003T05)

Granger S., and M. Wynne. 1999. Optimising measures of lexical variation in EFL learner corpora. In Corpora Galore, ed. J. Kirk, 249–257. Amsterdam: Rodopi.

Hovermale, D. J. 2010. An analysis of the spelling errors of L2 English learners. Presented at the CALICO 2010 Conference, Amherst, MA, USA, June 10-12, 2010.

(http://www.ling.ohio-state.edu/~djh/presentations/djh_CALICO2010.pptx)

Kukich, K. 1992. Techniques for automatically correcting words in text. ACM Computing Surveys 24 (4): 377-439.

Leacock, C., and M. Chodorow. 2003. C-rater: Automated Scoring of Short-answer Questions. Computers and Humanities 37 (4): 389-405.

Levenshtein, L. 1966. Binary codes capable of correcting deletions, insertions and reversals. Soviet Physics Doklady 10:707-710.

Lunsford, A. A., and K.J. Lunsford. 2008. Mistakes Are a Fact of Life: A National Comparative Study. College Composition and Communication 59 (4): 781-806.

Mitton, R., and T. Okada. 2007. The adaptation of an English spellchecker for Japanese writers. Paper presented at the Symposium on Second Language Writing, 15-17 Sept. 2007, Nagoya, Japan. Available from (http://eprints.bbk.ac.uk/592)

Page, E. B. 1967. The imminence of grading essays by computer. Phi Delta Kappan 47 (5): 238-243.

Pollock, J., and A. Zamora. 1984. Automatic spelling correction in scientific and scholarly text. Communications of the ACM 27 (4): 358-368.

Ramineni C., C.S. Trapani, D.M. Williamson, T. Davey, and B. Bridgeman. 2012a. Evaluation of the e-rater® Scoring Engine for the GRE® Issue and Argument Prompts. Research Report RR-12-02, Princeton, NJ: Educational Testing Service. (http://www.ets.org/research/policy_research_reports/rr-12-02)

Ramineni C., C.S. Trapani, D.M. Williamson, T. Davey, and B. Bridgeman. 2012b. Evaluation of the

e-rater® Scoring Engine for the TOEFL® Independent and Integrated Prompts. Research Report RR-12-06, Princeton, NJ: Educational Testing Service.

(http://www.ets.org/research/policy_research_reports/rr-12-06)

Rimrott, A., and T. Heift. 2008. Evaluating automatic detection of misspellings in German. Language Learning and Technology 12 (3): 73–92.

Sukkarieh, J. Z., and J. Blackmore. 2009. C-rater: Automatic Content Scoring for Short Constructed Responses. In Proceedings of the 22nd International Florida Artificial Intelligence Research Society Conference, 290-295, Menlo Park, CA: AAAI Press.

Turba, T. N. 1981. Checking for spelling and typographical errors in computer-based text. ACM SIGPLAN Notices 16 (6): 51-60.

Copyright (c) 2015 Michael Flor, Yoko Futagi, Melissa Lopez, Matthew Mulholland

Creative Commons License

This work is licensed under a Creative Commons Attribution 3.0 International License.

Authors who publish with this journal agree to the following terms:

Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.

Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.

Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).

Bergen Open Access Publishing