语料库中语料的标注

整理文档很辛苦,赏杯茶钱您下走!

免费阅读已结束,点击下载阅读编辑剩下 ...

阅读已结束,您可以下载文档离线阅读编辑

资源描述

崔 刚,盛永梅(清华大学 外语系,北京 100084) :语料标注是实现原始语料机读化的关键环节,也是语料库语言学领域的一个重要研究课题。本文结合国内外的有关研究成果以及国外的部分大型英语语料库的标注实践,介绍与讨论了语料标注的原则、模式以及类型,以供国内在建设英语语料库的过程中借鉴。:语料库;语料;标注:H087    :B    :1000-0062(2000)01-0089-06、 ,,,(McEnery&Wilson,1996),、。,。,,。,,,。,、、。、Leech,(1993):1.,。,,。,,。,,。2.,。,,。,。3.。,,,,。4.,。,:1999-9-5: (1966- ),,,;(1976- ),,. 2000115()JOURNALOFTSINGHUAUNIVERSITY(PhilosophyandSocialSciences) No.1 2000Vol.15。,,,,,,。5.,,。,,,,,。6.。,,,。,,。7.。,。,。,,,,,,。,,。,。:“,,。,。”、,,,。,。COCOA,OCP(OxfordConcordancePro-gram),“-”(Longman-Lancastercorpus)、“”。COCOA:,,A“”,,,SHAKESPEAR,。,ASHAKESPEAR。,COCOA,、、,。TEI(TextEncodingInitia-tive)(McEnery&Wilson,1996)。“”(TheBritishNationalCorpus)TEI。TEI(ACL,AssociationforComputa-tionalLinguistics)、(ALLC,AssociationforLiteraryandLinguisticComputing)(ACH,As-sociationforComputersandHumanities)。TEISGML(Stan-dardGeneralizedMarkupLanguage),,。TEI,(header)。,、、、、,。TEI(tags)(entityrefer-ences)。,、、,,。,(starttag)...,,(endtag),,/...。,p,/p。,(FSD,featuresystemdeclaration),&、;—。,vvd,v,v(lexicalverb),,d90(),,containedcontained&vvd、contained_vvdcontained;vvd。(DTD,documenttypedescription)。DTD、,。TEI、、。,,DTD、。DTDSGML,TEI。、、、、、、。1.,qcea.tagQCE(tag)A。,。、(、、)、、、(,)、(,、、)、(、)。TEI,。,80,。,COCOA,TEI。TEI(McEnery&Wilson,1996:32):例1.TEIHEADERFILEDESCTI-TLESTMTTITLELivesoftheSaintsfromtheBookofLismore:anelectronicedition/TITLEAUTHORAnonymous/AUTHORRE-SPSTMTRESPcompliedby/RESPNAMEElvaJohnston/NAME/RESPSTMT/TITLESTMTEDITIONSTMTEDI-TIONN=”1”FirstDraft,Revisedandcorrected.DATE1993-04-30/DATE/EDI-TIONRESPTMTRESPProofcorrectionby/RESPNAMEDrNicoleMeller/NAME。,,TEI,LivesoftheSaintsfromtheBookofLismore:anelectronicedition,,ElvaJohnston。1993430,NicoleMeller。2.。,,,,。,。,。COBUILD:BE  BeBED BewereBEDZ BewasBEG BeINGBEM BeamBEN BebeenBER BeareCC CD CS DEM DO DoDOD DoDOZ DoDT DTG DTP EX ThereHV HaveHVD HaveHVG HaveINGHVNHave91 HVZ  HaveINJJMDNEGnotNNNNSNPPNPPLPPLSPPOPPPPPSRBTOUH(yes,ugh,um)VBVBDVBGINGVBNVBZWHWH3.,,had,has,havinghave。,。,,。(Beale,1987),GeoffreySampsonSU-SANNE,,:例2.N12:0510g _ PPHSlm  He   heN12:0510h_VVDvstudiedstudyN12:0510I_ATthetheN12:0510j_NN1cproblemproblemN12:0510k_IFforforN12:0510m_DD22laaN12:0510n_DD222fewfewN12:0510p_NNT2secondssecondN12:0520a _ CC  and  andN12:0520b_VVDvthoughtthinkN12:0520c_IOofofN12:0520d_AT1aaN12:0520e_NNcmeansmeansN12:0520f_IIbbybyN12:0520g_DDQrwhichwhichN12:0520h_PPH1ititN12:0520i_VMdmightmayN12:0520j_VB0bebeN12:0520k_VVNtsolvedsolveN12:0520m_+._.4.,,。。,(BNC)、-(Lancaster-Leeds)(SpokenEnglishCorpus),。,Claudiasatonastool.(S=,NP=,VP=,PP=,N=,V=,P=,AT=):例3.,,(BNC)():例4.[S[NPClaudia NP1NP][VPsat VVD[PPon II[NPa AT1stool NN1NP]PP]VP]S]92(),,。(fullparsing)(skeletonparsing)。,。,5-(Lancaster-Leeds),6(SpokenEnglishCorpus):例5.[S[Ncsanother DTnew JJstyle NNfea-ture NNNcs][Vzbis BEZVzb][Nsthe ATI[NN/JJ&wine-glass NN[JJ+or CCflared JJJJ+]NN/JJ&]heel NN` '[Fr[Nqwhich WDTNq][Vzpwas BEDZshown VBNVzp][Tn[Vnteamed VBNVn][Rup RPR][Pwith INW[NP[JJ/JJ/NN&pointed JJ` '[JJ squared JJJJ ]` '{NN+and CCchisel NNNN+}JJ/JJ/NN&]toes NNSNp〗P]Tn]Fr]Ns]` 'S]例6.[S&[PFor IF[Nthe ATmemebers NN2[Pof IO[Nthis DD1university NNL1N]P]N]P}[Nthis DD1charter NN1N][Venshrines VVZ[a AT1victorious JJprinciple NN1N]V]S&]; ;and CC[S+[Nthe ATfruits NN2[Pof IO[Nthat DD1victory NN1N]P]N][Vcan VMimmediately RRbe VB0seen VVN[Pin II[Nthe ATinternational JJcommunity NNJ[Pof IO[Nscholars NN2N]P][Frthat CST[Vhas VHZgraduated VVNhere RLtoday RTV]Fr]N}P}V]S+]` '56,5,6,。,6N,5。5.。,。,。,LongmanDictionaryofContemporaryEnglish(Janssen,1990)“”(fieldcode)。KlausSchmidt。Wilson(McEnery&Wilson,1996),Wilson(00000000-;13010000-;21030000-;21072000-;21110321-;21110400-;23241000-;312411000-):例7.And    00000000the00000000soldiers23241000platted21072000a00000000crown21110400of00000000thorns13010000and00000000put21072000it00000000on00000000his00000000head21030000and00000000they00000000put21072000on00000000him    00000000a00000000purple31241100robe211103217,WilsonSchmidt(1993),,93 ,1、2、3,1“”,2“”,3“”,,,crown,211104,2“”,1,“”,1,“”,4,“”。6.,。Stenstrom(1984)“-”(Lon-don-LundCorpusofSpokenEnglish)。,16,,(sorry,excuseme)、(kindof,sortof)、(hello,goodmorning)、(please)。。,HallidayHasan(1976)《》。“-/”(Lan-caster-Oslo/BergenCorpus)。6,,,,,,。、  、、,,,(1998),,。,,。,,,。:[1]Beale,A.Towardsadistributionallexicon,inGarside,R.,Leech,G.&Sampson(eds)TheComputationalAnalysisofEnglish:ACorpusBasedApproach.Long-man.1987.[2]Halliday,M.&Hasan,R.CohesioninEnglish,Long-man.1976.[3]Janssen,S.Automaticsense-disambiguationwithLDOCE:enrichingsyntacticallyanalyzedcorprawithse-manticdata,inAarts,J.&Meijs(eds)TheoryandPraccticeinCorpusLinguistics,Rodopi.1990.[4]Leech,G.Corpusannotationschemes,LiteraryandLin-guisticComputing.1993,8(4):275-469.[5]McEnery,T.&Wilson,A,CorpusLinguistics,Edin-burghUniversityPress.1996.[6]Schmidt,K.M.BegriffsglossarundIndexzuUlrichsvonZatzikhovenLanzelet,Niemeyer.1993.[7]Stenstrom,A.B.Discoursetags,inAarts,J.&Meijs(eds)TheoryandPraccticeinCorpusLinguistics,Rodopi.1984.[8].[J].《》,1998(3):17-28.[9].[J].《》,1998,(3):4-12.( )94()

1 / 6
下载文档,编辑使用

©2015-2020 m.777doc.com 三七文档.

备案号:鲁ICP备2024069028号-1 客服联系 QQ:2149211541

×
保存成功