1TP391TheBasicProcessingofContemporaryChineseCorpusatPekingUniversitySPECIFICATIONYUShi-wenDUANHui-mingZHUXue-fengBingSWEN(InstituteofComputationalLinguistics,PekingUniversity,Beijing,100871)Abstract:TheInstituteofComputationalLinguistics,PekingUniversityhascompletedthebasicprocessingofacontemporaryChinesecorpusthathas27millionChineseCharacters.Inadditiontowordsegmentationandpart-of-speechtagging,theprocessinginvolvesthetaggingofpropernouns(personnames,placenames,organizationnamesandsoon),morphemesubcategoriesandthespecialusagesofverbsandadjectives.Thesuccessofthislarge-scalelanguageengineeringisattributedtotheSPECIFICATION,whichhadbeenmadebeforehandandwasbeingperfectedwhileinuse.WeareherebymakinganintroductiontotheSPECIFICATIONthroughthispublication,thusinvitingthecommentsfromalltheexpertsandourcolleaguesfortheimprovementofit.Keywords:contemporaryChinese;corpus;wordsegmentation;part-of-speechtagging;specification69483003973G1998030507486398519381219571219371219681042345*·67891011121314151617181920212223