Representing discourse coherence A corpus-based an

整理文档很辛苦,赏杯茶钱您下走!

免费阅读已结束,点击下载阅读编辑剩下 ...

阅读已结束,您可以下载文档离线阅读编辑

资源描述

RepresentingDiscourseCoherence:ACorpus-BasedStudyFlorianWolf∗UniversityofCambridgeEdwardGibson∗∗MassachusettsInstituteofTechnologyThisarticleaimstopresentasetofdiscoursestructurerelationsthatareeasytocodeandtodevelopcriteriaforanappropriatedatastructureforrepresentingtheserelations.Discoursestructureherereferstoinformationalrelationsthatholdbetweensentencesinadiscourse.ThesetofdiscourserelationsintroducedhereisbasedonHobbs(1985).Wepresentamethodforannotatingdiscoursecoherencestructuresthatweusedtomanuallyannotateadatabaseof135textsfromtheWallStreetJournalandtheAPNewswire.Alltextswereindependentlyannotatedbytwoannotators.Kappavaluesofgreaterthan0.8indicatedgoodinterannotatoragreement.Wefurthermorepresentevidencethattreesarenotadescriptivelyadequatedatastructureforrepresentingdiscoursestructure:Incoherencestructuresofnaturallyoccurringtexts,wefoundmanydifferentkindsofcrosseddependencies,aswellasmanynodeswithmultipleparents.Theclaimsaresupportedbystatisticalresultsfromourhand-annotateddatabaseof135texts.1.IntroductionAnimportantcomponentofnaturallanguagediscourseunderstandingandproductionishavingarepresentationofdiscoursestructure.Acoherentlystructureddiscoursehereisassumedtobeacollectionofsentencesthatareinsomerelationtoeachother.Thisarticleaimstopresentasetofdiscoursestructurerelationsthatareeasytocodeandtodevelopcriteriaforanappropriatedatastructureforrepresentingtheserelations.Therehavebeentwokindsofapproachestodefiningandrepresentingdiscoursestructureandcoherencerelations.Theseapproachesdifferwithrespecttowhatkindsofdiscoursestructuretheyareintendedtorepresent.Someaccountsaimtorepresenttheintentional-levelstructureofadiscourse;intheseaccounts,coherencerelationsreflecthowtheroleplayedbyonediscoursesegmentwithrespecttotheinterlocu-tors’intentionsrelatestotheroleplayedbyanothersegment(e.g.,GroszandSidner1986).Otheraccountsaimtorepresenttheinformationalstructureofadiscourse;intheseaccounts,coherencerelationsreflecthowthemeaningconveyedbyonediscoursesegmentrelatestothemeaningconveyedbyanotherdiscoursesegment(e.g.,Hobbs1985;Marcu2000;Webberetal.1999).Furthermore,accountsofdiscoursestructurevarygreatlywithrespecttohowmanydiscourserelationstheyassume,rangingfrom2(GroszandSidner1986)toover400differentcoherencerelations(reportedinHovyand∗ComputerLaboratoryandGeneticsDepartment,Cambridge,CB30FD,U.K.E-mail:Florian.Wolf@cl.cam.ac.uk∗∗DepartmentofBrainandCognitiveSciences,Cambridge,MA02139.E-mail:egibson@mit.edu.Submissionreceived:15thJune2004;Revisedsubmissionreceived:5thSeptember2004;Acceptedforpublication:23rdOctober2004©2005AssociationforComputationalLinguisticsComputationalLinguisticsVolume31,Number2Maier[1995]).However,HovyandMaier(1995)arguethat,atleastforinformational-levelaccounts,taxonomieswithmorerelationsrepresentsubtypesoftaxonomieswithfewerrelations.Thismeansthatdifferentinformational-level-basedtaxonomiescanbecompatiblewitheachother;theydifferwithrespecttohowdetailedorfine-grainedamannertheyrepresentinformationalstructuresoftexts.Goingbeyondthequestionofhowdifferentinformational-levelaccountscanbecompatiblewitheachother,MoserandMoore(1996)discussthecompatibilityofrhetoricalstructuretheory(RST)(MannandThompson1988)withthetheoryofGroszandSidner(1986).However,notethatMoserandMoore(1996)focusonthequestionofhowcompatibletheclaimsarethatMannandThompson(1988)andGroszandSidner(1986)makeaboutintentional-leveldiscoursestructure.Inthisarticle,weaimtodevelopaneasy-to-coderepresentationofinformationalrelationsthatholdbetweensentencesorothernonoverlappingsegmentsinadis-coursemonologue.Wedescribeanaccountwithasmallnumberofrelationsinordertoachievemoregeneralizablerepresentationsofdiscoursestructures;however,thenumberisnotsosmallthatinformationalstructuresthatweareinterestedinareobscured.Thegoaloftheresearchpresentedisnottoencodeintentionalrelationsintexts.Weconsiderannotatingintentionalrelationstoodifficulttoimplementinpracticeatthistime.Notethatwedonotclaimthatintentional-levelstructureofdiscourseisnotrelevanttoafullaccountofdiscoursecoherence;itjustisnotthefocusofthisarticle.Thenextsectiondescribesindetailthesetofcoherencerelationsweuse,whicharemostlybasedonHobbs(1985).Wetrytomakeasfewaprioritheoreticalassumptionsaboutrepresentationaldatastructuresaspossible.Theseassumptionsareoutlinedinthenextsection.Importantly,however,wedonotassumeatreedatastructuretorepresentdiscoursecoherencestructures.Infact,amajorresultofthisarticleisthattreesdonotseemadequatetorepresentdiscoursestructures.Thisarticleisorganizedasfollows.Section2describestheprocedureweusedtocollectadatabaseof135textsannotatedwithcoherencerelations.Section3describesindetailthedescriptionalinadequacyoftreestructuresforrepresentingdiscoursecoherence,andSection4providesstatisticalevidencefromourdatabasethatsupportsthisclaim.Section5offerssomeconcludingremarks.2.CollectingaDatabaseofTextsAnnotatedwithCoherenceRelationsThissectiondescribes(1)howwedefineddiscoursesegments,(2)whichcoherencerelationsweusedtoconnectdiscoursesegments,and(3)howtheannotationprocedureworked.2.1Di

1 / 40
下载文档,编辑使用

©2015-2020 m.777doc.com 三七文档.

备案号:鲁ICP备2024069028号-1 客服联系 QQ:2149211541

×
保存成功