Nishantha Medagoda, Ruvan Weerasinghe University of Colombo School of Computing Sri Lanka nmedagoda@yahoo.com arw@ucsc.lk ABSTRACT Open ended questions are an essential and important part of survey questionnaires. They provide an opportunity for researchers to discover unanticipated information regarding the domain of study. However, they are problematic for processing since they are unstructured questions to which possible answers are not suggested, and the respondent is free to answer in his or her own words. This paper presents a method of categorizing such open ended survey responses. A document clustering technique is employed in this study to categorize responses to open-ended survey questions. The algorithm employs several natural language processing techniques to extract a classification of responses automatically. Two experiments were carried out to determine the effectiveness of the proposed algorithm which proved to be promising. Keywords—Open-ended questions, Clustering
1
INTRODUCTION
Open ended questions in survey questionnaires are unstructured questions in which possible answers are not suggested, and which the respondent is expected to answer in his or her own words. When building a questionnaire for a statistical survey, it is essential to include such open-ended questions to gather unanticipated information. Open-ended questions are those questions that will elicit such additional information from the respondents. Since the freedom of answering these types of questions is given to the respondent, the respondent may write any answers which are related to the question. Such WH-questions usually begin with “how”, “what”, “when”, “where”, or “why”. Therefore there is no specific format for answers to these open-ended questions. In analyzing such responses we need to filter appropriate sentences, words from the responses. Often however, the responses to this type of
References: Adamic L.A, and B. A Huberman (2002). Zipf’s law and the Internet. Akira Ushioda (1996), Hierarchical Clustering of Words and Application to NLP Tasks. Fujitsu Laboratories Ltd. Kawasaki, Japan. Bullington Jim , I. Endres, and M. A. Rahman (1998). Open-Ended Question Classification Using Support Vector Machines, Department of Computer Science, University of West Georgia. Giorgetti.D and F. Sebastiani (2000), Automating Survey Coding by Multiclass Text Categorization Techniques, Istituto di Linguistica Computazionale Consiglio Nazionale delle Ricerche, Italy. Hiramatsu A, S.Tamura, H.Oiso, N.Komoda, The study of method of a typical opinion extraction from answers in open-ended answers, IEEJ Transactions on Electronics, Information and Systems Vol. 125 (2005). Inderjit S. Dhillon, S. Mallela and R.Kumar( 2002), University of Texas, Austin, USA. Enhanced word clustering for hierarchical text classification, Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining. Johnson R.A, and D.E.Wichern, Applied Multivariate statistical Analysis, Third Edition, 1996 Kang M, K. Asakimori, A.Utsuki and M.Kaburagi,(2005) Automated Text Clustering on Responses to Open-ended Questions in Course Evaluations, ITHET 6th Annual International Conference. Kang S.S, Keyword-based document Clustering, Korea, Proceeding AsianIR '03 Proceedings of the sixth international workshop on Information retrieval with Asian languages, 2003. Kim S.M and E.Hovy, Automatic detection of Opinion Bearing Word and Sentences. US. Melamed.D, R.Green and P.J.Turin (1995), Precision and Recall of Machine Translation. New York University.