I目录第一章绪论..................................................................................................................21.1研究背景及意义............................................................................................21.2研究对汉字笔画若干数据统计的难点........................................................21.2.1汉字字量大........................................................................................21.2.2字形相似、结构复杂........................................................................21.2.3笔顺不唯一........................................................................................21.3论文的工作....................................................................................................3第二章汉字笔画的相关概述......................................................................................42.1笔画特征........................................................................................................42.2笔画顺序........................................................................................................42.3笔画输入法....................................................................................................42.4笔画的分类....................................................................................................42.5本章小结........................................................................................................5第三章对汉字笔画若干数据的统计..........................................................................63.1汉字概述........................................................................................................63.2汉字使用频度................................................................................................63.3统计原始数据来源........................................................................................63.4汉字笔画统计的意义....................................................................................63.5汉字笔画统计的准备工作............................................................................63.5.1表合二为一........................................................................................63.5.2按照编码进行排序............................................................................63.6汉字的各种平均笔画数的统计....................................................................73.6.1汉字的算术平均笔画数....................................................................73.6.2按使用频度加权的平均笔画数........................................................73.6.3能与其它汉字区分开的前若干笔画的算术平均数........................83.6.4能与其它汉字区分开的前若干笔画的加权平均数........................83.7汉字笔画的其他数据信息的统计................................................................93.7.1以各种笔画起笔的汉字的数目........................................................93.7.2各种笔画在6763个汉字中出现的频度........................................103.7.3笔画相同的汉字..............................................................................113.7.4对连笔的数据统计..........................................................................113.8本章小结......................................................................................................12第四章统计汉字笔画若干数据的应用....................................................................134.1在基于笔画的汉字输入法上的应用..........................................................134.2在汉字的联机手写识别系统上的应用......................................................134.3本章小结......................................................................................................13结论..............................................................................................................................14参考文献......................................................................................................................14附录:..........................................................................................................................151对汉字笔画若干数据的统计与应用项衍,数学计算机科学学院摘要:汉字是中华民族迄今为止连续使用时间最长的主要文字,也是上古时期各大文字体系中唯一传承至今的文字。古老而复杂多样的汉字属于二维平面的方块字,由笔画构成。要研究并实现基于笔画的汉字输入法和汉字的联机手写识别系统,必须建立在对汉字笔画信息的各种数据的统计的基础之上。很显然,这些统计数据是实现基于笔画的汉字输入法和汉字的联机手写识别系统的前提,具有重要的指导意义。本文着眼于对汉字笔画的若干数据进行统计并且研究其应用,这些数据主要包括:汉字的算术平均笔画数、按使用频度加权的平均笔画数、能与其它汉字区分开的前若干笔画的算术平均数、能与其它汉字区分开的前若干笔画的加权平均数、以各种笔画起笔的汉字的数目、各种笔画在二级字库的6763个汉字中出现的频度、统计笔画相同的汉字、统计连笔(即邻笔)频度等。关键词:汉字;笔画;使用频度;StatisticsandApplicationonSomeChineseCharacterStrokeXiangYan,CollegeofMathematicsandComputerScienceAbstract:ChinesecharactersarethemaintextandthelongestcontinuousutilitywhichtheChinesenationusedsofar,andarealsotheonlycharactersheritagefromthemajorwritingsystemsofancienttimes.Ancient,complexanddiversecharactersaretwo-dimensionalplane,whichareconstitutedbythestrokes.TostudyandachievethestrokesofChinesecharacterinputmethodandChinesecharactersonlinehandwritingrecognitionsystem,wemustonthebasisofstatisticaldatainformationofChinesecharacterstrokes.Andobviously,thesestatisticshaveimportantguidingsignificanceonlybasedonthepremiseofthestrokesoftheChinesecharacterinputmethodandtheChinesecharactersonlinehandwritingrecognitionsystem.ThisarticlewithaneyetothestatisticaldataofthenumberofstrokesofChinesecharactersandstudytheirapplication,whichincludesthearithmeticaveragestrokesofChinese