Automated Essay Scoring With E-rater® v.2.0
Yigal Attali Jill Burstein
Research & Development
November 2005 RR-04-45
Automated Essay Scoring With E-rater® v.2.0
Yigal Attali and Jill Burstein ETS, Princeton, NJ
November 2005
As part of its educational and social mission and in fulfilling the organization 's nonprofit charter and bylaws, ETS has and continues to learn from and also to lead research that furthers educational and measurement research to advance quality and equity in education and assessment for all users of the organization 's products and services. ETS Research Reports provide preliminary and limited dissemination of ETS research prior to publication. To obtain a PDF or a print copy of a report, please visit: http://www.ets.org/research/contact.html
Copyright © 2005 by Educational Testing Service. All rights reserved. EDUCATIONAL TESTING SERVICE, E-RATER ETS, the ETS logo, and TOEFL are registered trademarks of Educational Testing Service. TEST OF ENGLISH AS A FOREIGN LANGUAGE is a trademark of Educational Testing Service. CRITERION is a service mark of Educational Testing Service. GMAT and the GRADUATE MANAGEMENT ADMISSION Test are registered trademarks of the Graduate Management Admission Council.
Automated Essay Scoring With E-rater® v.2.0
Yigal Attali and Jill Burstein ETS, Princeton, NJ
Abstract E-rater® has been used by ETS for automated essay scoring since 1999. This paper describes a new version of e-rater (v.2.0) that differs from the previous one (v.1.3) with regard to the feature set and model building approach. The paper describes the new version, compares the new and previous versions in terms of performance, and presents evidence on the validity and reliability of scores produced by the new version. Key words: Automated essay scoring, e-rater, CriterionSM
i
E-rater® has been used by ETS for automated essay scoring since February 1999. Burstein, Chodorow, and Leacock (2003)
References: Breland, H. M., & Gaynor, J. L. (1979). A comparison of direct and indirect assessments of writing skill. Journal of Educational Measurement, 16, 119-128. Breland, H. M., Jones, R. J., & Jenkins, L. (1994). The College Board vocabulary study (College Board Rep. No. 94-4; ETS RR-94-26). New York: College Entrance Examination Board. Burstein, J., Chodorow, M., & Leacock, C. (2003, August). CriterionSM: Online essay evaluation: An application for automated evaluation of student essays. In J. Riedl & R. Hill (Eds.), Proceedings of the Fifteenth Annual Conference on Innovative Applications of Artificial Intelligence, Acapulco, Mexico (pp. 3-10). Menlo Park, CA: AAAI Press. Burstein, J., Marcu, D., & Knight, K. (2003). Finding the WRITE stuff: Automatic identification of discourse structure in student essays. IEEE Intelligent Systems: Special Issue on Natural Language Processing, 18(1): 32-39. Haberman, S. (2004). Statistical and measurement properties of features used in essay assessment (ETS RR-04-21). Princeton, NJ: ETS. Salton, G., Wong, A., & Yang, C.S. (1975). A vector space model for automatic indexing. Communications of the ACM, 18, 613-620. Swets, J. A. (1996). Signal detection theory and ROC analysis in psychology and diagnostics: collected papers. Mahwah, NJ: Lawrence Erlbaum Associates. 21