华中科技大学硕士学位论文链接分析在金融监管中的应用研究姓名:薛蕾申请学位级别:硕士专业:计算机应用技术指导教师:刘芳20060508IIIAbstractFinancialcrimesalwayshavecloserelationshipwiththeexceptionalfundflowinfinancialnetworks,someoftheexistingfinancialsupervisingtechniquesandmeasuresareunabletomonitormanyhiddenformsoffundflows.Inviewofthis,makeuseoflinkanalysistechniqueindataminingfield,itcaneffectivelydetectexceptionalfundflowsinfinancialnetworksviaanalyzingthefeaturesoffundflows,suchasquantities,trailsandfrequency.Asingle-transactiontrackingmeasurewastoohardtodetectexceptionalflowsoffundsfortheenormoustransactions.However,theproblemwillbesolvedaftertheintroductionoflinkanalysis.Thedetectingprocesswithlinkanalysisisgivenasfollows:taketheaccountasdetectingobjects,clustertheobjectiveaccounts,andthencalculatethesuspiciousrankoftheaccountaccordingtothetransactionfrequencyandtheamountofmoney.Atlast,findtheexceptionalfundflowsbyanalyzingtheexceptionalaccounts.Theentireprocessconsistsofthefollowingtwophases:phaseofclusteringandphaseofcomputingsuspiciousrankanddiggingouttheexceptionalfundflows.Clusteringistheprocessofdiscoveringthegroupsinwhichtheaccountstransactwitheachothercontinually.Conventionaldataclusteringalgorithmsidentifygroupsofsimilaritemsinadatasetbasedontheirattributevaluesonly,littleworkindataclusteringfocusesontherelationshipsamongdatasets.Inordertoovercomethedisadvantagesabove,firstly,definethefrequencylevelquantificationallyaccordingtothefrequencyandmoneyamountbetweentwoaccounts,andusethreeclusteringmethodswhichareallbasedonthefrequencylevel:link-basedvectorclusteringmethod,ittakesthevectorswhichdenotetherelationshipamongtheaccountsasitsclusteringobject;graphpartitioning-basedclusteringmethod,itsgoalistopartitionthetransactionnetworkssuchthatconnectionswithinclustersaremaximizedandconnectionsbetweenclustersareminimized;association-basedclusteringmethod,itfindsthemaximalconnectedandfrequentsetsbasedonthetheoryofassociationrules,andgettheclusteringresultsinwhichtheaccountshavemorestrongerassociations.IIIInthephaseofcomputingsuspiciousrankanddiggingouttheexceptionalfundflows,acalculationmethodisputforwardaccordingtothequantitiesoftransactions,theamountofmoneyandthemoneyproportionwhichissuppliedbythesourceaccount.Theexceptionalaccountispickedoutaccordingtothesuspiciousrank,andtheexceptionalfundflowisdiggedout.Keywords:DataMining,Linkanalysis,Clustering,Link-basedclustering,Exceptionalfundflow_____111.1(DataMining)[1](LinkAnalysis)20[2][3]12342[4,5]1231.21.2.11.2.1.13[6,7]1DijkstraPFS22-3Dijkstra1.2.1.2Page1998PageRank[8-10]PageRankTitleKeywordsPageRankPageRankABPageRankABPageRankPageRankKleinbergHITSPageRankHITS[11]1.2.1.3J.Kubica2002[12,13]4[3]1.2.1.4(cluster)[14]Web1.2.21SAS2Mantas3FinCENAISystem[15,16]4NetMap[2,6]5COPLINKDetect[17-19]56ClearForest[2]co-occurrencelinkssemantic-links7GoogleGooglePageRank1.3Web6722.1[20][21]1LinearRegressionLogisticRegression0182(associationanalysis)(itemset)3(clustering)4(classification)5(predication)InductiveAlgorithms6(time-seriespattern)7(deviation)8NeuralNetworkMPHebb9GeneticAlgorithm9102.12.110[21]2.2[2,6]2.2.1Webout-linksin-links2.2.1.1Internet11[22,23]2.1PageRank[8]PageRankPageRank1005350333950502.1PageRank2.1PageRank312PageRank3PageRank2.2.1.2YitongWang12out-linksin-links[25,26]Pout(N)Pin(M)PoutiPii10PinjPjj10PQ∑∑∑∑====+=+=⋅+⋅=⋅=NiNjoutjiniNiNjoutjiniininoutoutQQQPPPQPPPQPQPQPQPDistance11221122||||;||||||||||||||||||||),(2-1||||PP2.2.2——[27-29]JenniferJ.XuDijkstra[7]JenniferJ.Xu0-1132.22.2ADA-B-C-DA-E-DA-B-C-DAD0.5×0.8×0.7=0.28A-E-DAD0.8×0.3=0.24A-B-C-DAD12w1)(0-ln≤=wwl10w≤1-lnw≥02l1l221lnln-ww−w1w23ABplll……,,21∑=piil1''2'1,,qlll……AEDCB0.50.80.70.30.814∑=qiil1'∑∑==piqiiill11'iiwlln−=''lniiwl−=)lnexp()lnexp(1'1∑∑==qiipiiww∏∏==qiipiiww1'12.2.3[30,31]JeremyKubica[12,13]121G12E1E23(DemographicData)(LinkData)(DemographicModel)(LinkModel)——(Chart)2.315DemographicModelp(MemberG1|Demographic)classifierp(MemberG2|Demographic)classifierp(MemberG3|Demographic)classifierp(MemberG6|Demographic)classifierDemographicDataG1G2G3G4G5G6Atkins***Brown**Chapman**DickensEssex****Franks*PersonGroupChartPersonAgeJobNationalityAtkins24TeacherBritainBrown34ClerkUSAChapman30DriverUSADickens18StudentFranceEssex30TeacherBritainFranks25TraderUSADemographicDataLinkTypePlPRPhone0.030.03Meeting0.200.20Money0.010.01Email0.050.05LinkModelPersonsType{Atkins,Chapman}Money{Brown,Dickens,Essex}Meeting{Atkins,Brown,Essex}Email{Essex,Franks}EmailLinkData2.31DemographicData(DD):DDDD2LinkData(LD)LD3DemographicModel(DM)DMNGNGNGiiDM164LinkModel(LM):LMLM5Chart(CH):CHCHgpDDpDMppgCH2.32.3.1(Cluster)[1,21]1(partitioningmethod)nnk≤nk-k-2(hierarchicalmethod)173(density-basedmethod)DBSCANOPTICS4(grid-basedmethod)5(model-basedmethod)2.3.2[32][14][33][34]G=VEijijkij018∈∈=∑otherwiseEorEifjisSkk0ee),(jiijk==otherwisekkifsjk01i2-21KargerMin-CutKarger1993Min-Cut[35]2MajorClustMajorClust[36]3SpectralSpectral[37-40]ShiMalik2000SpectralJ(A,B)J(A,B)∑∑∈∈+=BjjAiidBACutdBACutBAJ),(),(),(2-3∑∈∈=BjAiijSBACut,),(∑=kikiSdijSij2.4Web192033.13.13.13.23.1213.23.23.2.13.3YYNN22