k-mean-clustering

整理文档很辛苦,赏杯茶钱您下走!

免费阅读已结束,点击下载阅读编辑剩下 ...

阅读已结束,您可以下载文档离线阅读编辑

资源描述

ResearchonK-meansclusteringalgorithmName:SunShanshanCONTENTSIntroductionClusteringK-meansClustingmethodsResultsConclusionsINTRODUCTIONWhatisclustering?Clusteringistheclassificationofobjectsintodifferentgroups,ormoreprecisely,thepartitioningofadatasetintosubsetssothatthedataineachsubset(ideally)sharesomecommontrait-oftenaccordingtosomedefineddistancemeasure.CommonDistancemeasures:•Distancemeasurewilldeterminehowthesimilarityoftwoelementsiscalculatedanditwillinfluencetheshapeoftheclusters1.TheEuclideandistance(alsocalled2-normdistance)isgivenby:1.TheManhattandistance(alsocalled1-normdistance)isgivenby:||),(1ipiiyxyxd21)||(),(ipiiyxyxdINTRODUCTION3.Themaximumnormisgivenby:||max),(1iipiyxyxdINTRODUCTIONK-meansClusting:Thek-meansalgorithmisanalgorithmtoclusternobjectsbasedonattributesintokpartitons,wherekn.Itissimilartotheexpectation-maximizationalgorithmformixturesofGaussiansinthattheybothattempttofindthecentersofnaturalclustersinthedata.Itassumesthattheobjectattributesformavectorspace.INTRODUCTIONAnalgorithmforpartitioning(orclustering)NdatapointsintoKdisjointsubsetsSjcontainingdatapointssoastominimizethesum-of-squarescriterionwherexnisavectorrepresentingthethenthdatapointandujisthegeometriccentroidofthedatapointsinSj.INTRODUCTIONSimplyspeakingk-meansclusteringisanalgorithmtoclassifyortogrouptheobjectsbasedonattributes/featuresintoKnumberofgroup.Kispositiveintegernumber.Thegroupingisdonebyminimizingthesumofsquaresofdistancesbetweendataandthecorrespondingclustercentroid.INTRODUCTIONMETHODSHowtheK-MeanClusteringalgorithmworks?METHODSStep1:Beginwithadecisiononthevalueofknumberclusters.K=2Step2:Putanyinitialpartitionthatclassifiesthedataintokclusters.Youmayassignthetrainingsamplesrand-omly,orsystematicallyasthefollowing:1.Takethefirstktrainingsampleassingle-elementclustersMETHODS2.Assigneachoftheremaining(N-k)trainingsampletotheclusterwiththenearestcentroid.Aftereachassignment,recomputehecentroidofthegainingcluster.Step3:Takeeachsampleinsequenceandcomputeitsdist-ancefromthecentroidofeachoftheclusters.Ifasampleisnotcurrentlyintheclusterwiththeclosestcentroid,switchthissampletothatclusterandupdat-ethecentroidoftheclustergainingthenewsampleandtheclusterlosingthesample.METHODSMETHODSStep4.Repeatstep3untilconvergenceisachived,thatisuntilapassthroughthetrainingsamplecausesnonewassignments.Weobtainresultthatshownintheaboveprocesses.Comparingthegroupingoflastiterationandthisiterationrevealsthattheobjectsdoesnotmovegroupanymore.Thus,thecomputationofthek-meanclusteringhasreacheditsstabilityandnomoreiterationisneeded..Itisrelativelyefficientandfast.ItcomputesresultatO(t*k*n),wherenisnumberofobjectsorpoints,kisnumberofclustersandtisnumberofiterations.RESULTSRESULTSk-meansclusteringcanbeappliedtomachinelearningordataminingUsedonacousticdatainspeechunderstandingtoconvertwaveformsintooneofkcategories(knownasVectorQuantizationorImageSegmentation).AlsousedforchoosingcolorpalettesonoldfashionedgraphicaldisplaydevicesandImageQuantization.CONCLUSIONK-meansalgorithmisusefulforundirectedknowledgediscoveryandisrelativelysimple.K-meanshasfoundwidespreadusageinlotoffields,rangingfromunsupervisedlearningofneuralnetwork,Patternrecognitions,Classificationanalysis,Artificialintelligence,imageprocessing,machinevision,andmanyothers.

1 / 17
下载文档,编辑使用

©2015-2020 m.777doc.com 三七文档.

备案号:鲁ICP备2024069028号-1 客服联系 QQ:2149211541

×
保存成功