AutomaticScoringofShortHandwrittenEssaysinReadingComprehensionTestsSargurSrihari,JimCollins,RohiniSrihari,HarishSrinivasan,ShravyaShetty,andJaninaBrutt-GrifflerTR-01-07June2007CenterofExcellenceforDocumentAnalysisandRecognition(CEDAR)520LeeEntrance,Suite202Amherst.NewYork14228AutomaticScoringofShortHandwrittenEssaysinReadingComprehensionTestsSargurSrihari,JimCollins,RohiniSrihari,HarishSrinivasan,ShravyaShetty,andJaninaBrutt-GrifflerCenterofExcellenceforDocumentAnalysisandRecognition(CEDAR)UniversityatBuffalo,StateUniversityofNewYorkAmherst,NewYork14228,U.S.A.srihari@cedar.buffalo.edu,Abstract.Readingcomprehensionislargelytestedinschoolsusinghandwrittenresponses.Thepaperdescribescomputationalmethodsofscoringsuchresponsesusinghandwritingrecognitionandautomaticessayscoringtechnologies.Thegoalistoassigntoeachhandwrit-tenresponseascorewhichiscomparabletothatofahumanscorereventhoughmachinehandwritingrecognitionmethodshavehightranscriptionerrorrates.Theapproachesarebasedoncouplingmethodsofdocumentimageanalysisandrecognitiontogetherwiththoseofautomatedessayscoring.Documentimage-leveloperationsinclude:removalofpre-printedmatter,segmentationofhandwrittentextlinesandextractionofwords.Handwritingrecogni-tionisbasedonafusionofanalyticandholisticmethodstogetherwithcontextualprocessingbasedontrigrams.Thelexiconstorecognizehandwrittenwordsarederivedfromthereadingpassage,thetestingprompt,answerrubricandstudentresponses.Recognitionmethodsuti-lizechildren’shandwritingstyles.Heuristicsderivedfromreadingcomprehensionresearchareemployedtoobtainadditionalscoringfeatures.Resultswithtwomethodsofessayscoring–bothofwhicharebasedonlearningfromahuman-scoredset–aredescribed.Thefirstisbasedonlatentsemanticanalysis,whichrequiresareasonablelevelofhandwritingrecog-nitionperformance.Thesecondusesanartificialneuralnetworkwhichisbasedonfeaturesextractedfromthehandwritingimage.Atest-bedofessayswritteninresponsetopromptsinstatewidereadingcomprehensiontestsandscoredbyhumansisusedtotrainandevaluatethemethods.End-to-endperformanceresultsarenotfarfromautomaticscoringbasedonperfectmanualtranscription,therebydemonstratingthathandwrittenessayscoringisatopicwithpracticalpotential.Keywords:AutomaticEssayScoring,ContextualHandwritingRecognition,ReadingComprehension,LatentSemanticAnalysis,ArtificialNeuralNetworks1INTRODUCTIONReadingcomprehensionisanimportantcomponentoflearninginschools.Tasksthatrequirestudentstowriteabouttextsareubiquitousatalllevelsofschoolingandassessment,andlow-performingwritershavedifficultywithsuchtasks.Forexample,arecentNewYorkStateassessmentoffourthgradeEnglishlanguageartsaskedstudentstowriteafterreadinganessayandapoemaboutwhales,andthepromptclearlyspecifiedthatstudentsshoulduseinformationfromthetextstheyhadreadintheirresponses.Thetestpromptandtworesponseswereasfollows.2TestPrompt:Doyouthinkthatfishingboatsshouldbeallowedinwaterswherewhalesswim?Whyorwhynot?UsedetailsfromBOTHthearticleandthepoemtosupportyouranswer.Inyouranswer,besureto–Stateyouropinion,–Explainyourreasonsforthisopinion,–SupportyouropinionusinginformationfromBOTHthearticleandthepoem.LowScoringResponse:“Theyshouldnotbealoudwherewhaleare.Becausewhaleneedtosiwortheywilldie.”HighScoringResponse:“Ithinkfishingboatsshouldnotbeallowedwherewhalesarebecausethepeoplemighthurtthewhaleorgetitinthefishingnetandthewhalemighteatthefishinthefishingnetandthepeoplemightthrowaspearatit.Theymightevengoandkillthewhalefornoreasonwhatsoever.Theymightevenhurtthewhalewiththeboatanditmightgetkilledthatway.ThatiswhyIthinkthatfishingboatsandnotallowedwherewhalesare.”Whereasthesecondwriterpresentsarelativelyfull,logicallyconnected,anderrorfreeresponse,thefirstwriterusesinformationminimally,farfromtheextentnecessarytoformaskilledargument.Whileelectronicallywrittenresponsesarebecomingthestandardforcollegelevelentrancetesting,handwrittenresponsesaretheprincipalmeansinstate-widetestinginschools.Thisisduetoissuessuchashowearlytointroducekey-boardingskills,academicintegritywithcloselyspacedteststations,networkdown-timeduringtesting,etc.Sincetheapproachofusinghandwrittenessaysinreadingcomprehensionevaluationisefficientandreliableitislikelytoremainakeycomponentoflearning.Writingdonebyhandistheprimarymeansoftestingstudentsonstateassessments.ConsiderasanexampletheNewYorkStateEnglishLanguageAssessment(ELA)administeredstatewideingrades5and8.InthereadingpartofthetestthestudentisaskedtoreadapassagesuchasthatgiveninFig1,whichisagrade8example,andrespondtoseveralpromptsinwriting.Anexamplepromptis:“HowwasMarthaWashington’sroleasFirstLadydifferentfromthatofEleanorRoosevelt?UseinformationfromAmericanFirstLadiesinyouranswer.”ThecompletedanswersheetsofthreedifferentstudentstothepromptaregiveninFig.2.Theresponsesarescoredbyhumanassessorsonaseven-pointscaleof0-6.ArubricforthescoringisgiveninTable1.Thisisreferredtoasaholisticrubric–whichisincontrasttoananalyticrubricthatcapturesseveralwritingtraits.Assessinglargenumbersofhandwrittenresponsesisarelativelytime-consumingandmonoto-noustask.Atthesametimethereisanintenseneedtospeedupandenhanceth