ConvolutionalNeuralNetworksforSentimentClassification何云超yunchaohe@gmail.comWordVectors•CNN中使用词向量的三种方法•作为网络参数,在模型训练中学习,随机初始化•使用词向量模型(word2vec,GloVe等)训练词向量,在模型训练中保持不变•使用词向量模型(word2vec,GloVe等)训练词向量,用于网络初始化,在模型训练中调整SentenceMatrix•矩阵中的每一行或者每一列为一个词向量ConvolutionalLayer•WideConvolution•NarrowConvolutionTheredconnectionsallhavethesameweight.s+m-1=7-5+1=3s+m-1=7+5-1=11PoolingLayer•Maxpooling:Theideaistocapturethemostimportantfeature—onewiththehighestvalue—foreachfeaturemap.Dropout:ASimpleWaytoPreventNeuralNetworksfromOverfitting•Consideraneuralnetwithonehiddenlayer.•Eachtimewepresentatrainingexample,werandomlyomiteachhiddenunitwithprobability0.5.•Sowearerandomlysamplingfrom2^Hdifferentarchitectures.•Allarchitecturesshareweights.•Dropoutpreventsunitsfromco-adapting(共同作用)toomuch.HDropout:ASimpleWaytoPreventNeuralNetworksfromOverfittingCNNforSentenceClassification[1]•Twochannels•CNN-rand•CNN-non-static•CNN-static•CNN-multichannelDCNNOverview[2]•ConvolutionalNeuralNetworkswithDynamic𝑘-MaxPooling•WideConvolution•Dynamic𝑘-MaxPooling•𝑙:当前卷积层数•𝐿:卷积曾总数•𝑠:句子长度•𝑘𝑡𝑜𝑝:最高层卷积层参数•Dynamic𝑘-MaxPooling•𝑙:当前卷积层数•𝐿:卷积曾总数•𝑠:句子长度•𝑘𝑡𝑜𝑝:最高层卷积层参数•例•IF,𝐿=3,𝑠=18𝑘𝑡𝑜𝑝=3•Then,131max(3,18)max(3,12)=123k232max(3,18)max(3,6)=63k•Folding•问题:•卷积操作独立作用于每一行•同一行中建立了复杂的依赖•全连接层之前,不同行之间相互独立•因此:•Folding操做将每两行相加•d行降低为d/2•每一行都依赖于下层中的两行SemanticClustering[3]SentenceMatrixSemanticCandidateUnitsSemanticUnitsm=2,3,…,句子长度/2SemanticCliquesSemanticClusteringSentenceMatrixSemanticCandidateUnitsSemanticUnitsm=2,3,…,句子长度/2SemanticCliquesSemanticcliquesseq-CNN[4]•受启发于图像有RGB、CMYK多通道的思想,将句子视为图像,句子中的单词视为像素,因此一个d维的词向量可以看成一个有d个通道的像素•例词汇表句子句向量多通道...[000][000][100][001][010]Enrichwordvectors•使用了字符级的向量(character-levelembeddings),将词向量和字符向量的合并在一起作为其向量表示。[5]•使用传统的文本特征来扩展词向量,主要包括:大写单词数量、表情符号、拉长的单词(ElongatedUnits)、情感词数量、否定词、标点符号、clusters、n-grams。[6]MVCNN:MultichannelVariable-SizeConvolution[7]•不同wordembeddings所含有的单词不一样•HLBL•Huang•GloVe•SENNA•Word2vec•对某些unknownwords的处理•Randomlyinitialized•Projection:(mutuallearning)𝑎𝑟𝑔𝑚𝑖𝑛||𝑤𝑗−𝑤𝑗||2MVCNN:Training•Pretraining•Unsupervisedtraining•Averageofcontextwordvectorsasapredictedrepresentationofthemiddleword•Toproducegoodinitialvalues•Training•LogisticregressionReferences[1]Kim,Y.(n.d.).ConvolutionalNeuralNetworksforSentenceClassification.Proceedingsofthe2014ConferenceonEmpiricalMethodsinNaturalLanguageProcessing(EMNLP).[2]Kalchbrenner,N.,Grefenstette,E.,&Blunsom,P.(n.d.).AConvolutionalNeuralNetworkforModellingSentences.Proceedingsofthe52ndAnnualMeetingoftheAssociationforComputationalLinguistics(Volume1:LongPapers).[3]Wang,P.,Xu,J.,Xu,B.,Liu,C.L.,Zhang,H.,Wang,F.,&Hao,H.(2015).SemanticClusteringandConvolutionalNeuralNetworkforShortTextCategorization.InProceedingsofthe53rdAnnualMeetingoftheAssociationforComputationalLinguisticsandthe7thInternationalJointConferenceonNaturalLanguageProcessing(Vol.2,pp.352-357).[4]Johnson,R.,&Zhang,T.(n.d.).EffectiveUseofWordOrderforTextCategorizationwithConvolutionalNeuralNetworks.Proceedingsofthe2015ConferenceoftheNorthAmericanChapteroftheAssociationforComputationalLinguistics:HumanLanguageTechnologies.[5]dosSantos,C.N.,&Gatti,M.(2014).Deepconvolutionalneuralnetworksforsentimentanalysisofshorttexts.InProceedingsofthe25thInternationalConferenceonComputationalLinguistics(COLING),Dublin,Ireland.[6]Tang,D.,Wei,F.,Qin,B.,Liu,T.,&Zhou,M.(2014,August).Coooolll:Adeeplearningsystemfortwittersentimentclassification.InProceedingsofthe8thInternationalWorkshoponSemanticEvaluation(SemEval2014)(pp.208-212).[7]WenpengYin,HinrichSchütze.MultichannelVariable-SizeConvolutionforSentenceClassification.The19thSIGNLLConferenceonComputationalNaturalLanguageLearning(CoNLL'2015,longpaper).July30-31,Peking,China.谢谢聆听Q&A何云超yunchaohe@gmail.com