Jill Burstein
Educational Testing Service Rosedale Road, 18E Princeton, NJ 08541 jburstein@ets.org
Martin Chodorow
Department of Psychology Hunter College 695 Park Avenue New York, NY 10021 martin.chodorow@hunter.cuny.edu
Claudia Leacock
Educational Testing Service Rosedale Road, 18E Princeton, NJ 08541 cleacock@ets.org
Abstract
This paper describes a deployed educational technology application: the CriterionSM Online Essay Evaluation Service, a web-based system that provides automated scoring and evaluation of student essays. Criterion has two complementary applications: E-rater®, an automated essay scoring system and Critique Writing Analysis Tools, a suite of programs that detect errors in grammar, usage, and mechanics, that identify discourse elements in the essay, and that recognize elements of undesirable style. These evaluation capabilities provide students with feedback that is specific to their writing in order to help them improve their writing skills. Both applications employ natural language processing and machine learning techniques. All of these capabilities outperform baseline algorithms, and some of the tools agree with human judges as often as two judges agree with each other.
2. Application Description
Criterion contains two complementary applications that are based on natural language processing (NLP) methods. The scoring application, e-rater®, extracts linguisticallybased features from an essay and uses a statistical model of how these features are related to overall writing quality to assign a holistic score to the essay. The second application, Critique, is comprised of a suite of programs that evaluate and provide feedback for errors in grammar, usage, and mechanics, identify the essay’s discourse structure, and recognize undesirable stylistic features. See Appendices for sample evaluations and feedback.
2.1. The E-rater
References: Burstein, J. and Wolska, M. ( appear). Toward Evaluato tion of Writing Style: Overly Repetitious Word Use in Student Writing. In Proceedings of the 10th Conference of the European Chapter of the Association for Computational Linguistics. Budapest, Hungary, April, 2003. Burstein, J., Marcu, D., and Knight, K. 2003. Finding the WRITE Stuff: Automatic Identification of Discourse Structure in Student Essays. IEEE Intelligent Systems: Special Issue on Natural Language Processing 18(1), pp. 32-39. Burstein, J., Kukich, K., Wolff, S., Lu, C., Chodorow, M., Braden-Harder, L., and Harris M. D. 1998. Automated Scoring Using A Hybrid Feature Identification Technique. Proceedings of 36th Annual Meeting of the Association for Computational Linguistics, 206-210. Montreal, Canada Chodorow, M., and Leacock, C. 2000. An Unsupervised Method for Detecting Grammatical Errors. Proceedings of the 1st Annual Meeting of the North American Chapter of the Association for Computational Linguistics, 140-147. Elliott, S. 2003. Intellimetric: From Here to Validity. In Shermis, M., and Burstein, J. eds. Automated essay scoring: A cross-disciplinary perspective. Hillsdale, NJ: La wrence Erlbaum Associates. Foltz, P. W., Kintsch, W., and Landauer, T. K. 1998. Analysis of Text Coherence Using Latent Semantic Analysis. Discourse Processes 25(2-3):285-307. Golding, A. 1995. A Bayesian Hybrid for ContextSensitive Spelling Correction. Proceedings of the 3 rd Workshop on Very Large Corpora, 39-53. Cambridge, MA. Larkey, L. 1998. Automatic Essay Grading Using Text Categorization Techniques. Proceedings of the 21st ACMSIGIR Conference on Research and Development in I nformation Retrieval, 90-95. Melbourne, Australia. MacDonald, N. H., Frase, L. T., Gingrich P. S., and Keenan, S.A. 1982. The Writer’s Workbench: Computer Aids for Text Analysis. IEEE Transactions on Communications 30(1):105-110. Page, E. B. 1966. The Imminence of Grading Essays by Computer. Phi Delta Kappan, 48:238-243. Quirk, R., Greenbaum, S., Leech, G., and Svartik, J. 1985. A Comprehensive Grammar of the English Language. New York: Longman. Ratnaparkhi, A. 1996. A Maximum Entropy Part-ofSpeech Tagger. In Proceedings of the Empirical Methods in Natural Language Processing Conference, University of Pennsylvania. Salton, G., Wong, A., and Yang, C.S. 1975. A Vector Space Model for Automatic Indexing. Communications of the ACM 18(11): 613-620. Shermis, M., and Burstein, J. eds. 2003. Automated Essay Scoring: A Cross-Disciplinary Perspective. Hillsdale, NJ: Lawrence Erlbaum Associates. This paper appeared in the published proceedings of the fifteenth annual conference on innovative applications of artificial intelligence, held in Acapulco, Mexico, August 2003. Reposted on www.ets.org with permission of the Association for the Advancement of Artificial Intelligence.