山东科技大学本科毕业设计(论文)题目大数据及数据挖掘方法学院名称数学与系统科学学院专业班级统计学10学生姓名周广军学号201001051633指导教师高井贵二0一四年六月1大数据及数据挖掘方法摘要随着计算机技术的革新,互联网新媒体的快速发展,人们的生活已经进入高速信息时代。我们每天的生活都要产生大量数据,因此我们获取数据的速度和规模不断增长,大量数据不断的被存入存储介质中形成海量数据。海量数据的存储、应用及挖掘已成为人们研究的重要命题。数据挖掘是从存放在数据库、数据仓库或者其他信息库中大量的不完全的有噪声的模糊的随机的数据中提取隐含在其中的人们事先未知、但潜在有用的信息和知识过程。表现形式为:规则、概念、规律及模式等。数据挖掘是一门广义的交叉学科,从一个新的角度把数据库技术、人工智能、统计学等领域结合起来,从更深层次发掘存在于数据内部新颖、有效、具有潜在效用的乃至最终可理解的模式。在数据挖掘中,数据分为训练数据、测试数据、和应用数据。数据挖掘的关键是在训练数据中发现事实,以测试数据作为检验和修正理论的依据,把知识应用到数据中去。本文首先说明了大数据的概念及兴起与发展历程,然后介绍各种主流的数据分析挖掘方法。关键词:大数据数据挖掘数据分析方法2AbstractWiththedevelopmentofcomputertechnology,therapiddevelopmentofInternetandnewmedia,people'slifehasenteredtheinformationera.Oureverydaylifeistohavealargeamountofdata,sowegetthegrowingdataspeedandscale,alargeamountofdatahavebeenstoredintheformofmassdatastoragemedium.Thestorage,applicationandminingmassivedatahasbecomeanimportantpropositionthatpeoplestudy.Dataminingisstoredinthedatabasefromthedatawarehouse,orotherinformationinthelibraryalotofincomplete,noisefuzzyrandomdatainwhichtheextractionofimplicitpreviouslyunknown,butpotentiallyusefulinformationandknowledgeprocess.Manifestation:therules,concepts,rulesandpatterns.Dataminingisacrossedsubject,databasetechnology,artificialintelligence,statisticsandotherfieldstogethertofromanewpointofview,fromamoredeepexcavationindatawithinanovel,effective,withpotentiallyusefulandultimatelyunderstandablepatterns.Indatamining,dataisdividedintotrainingdata,testdata,andtheapplicationofdata.Thekeytodataminingisfactfindinginthetrainingdata,thetestdataastestandmodifythetheorybasis,theapplicationofknowledgetothedata.Thispaperfirstlyillustratestheconceptandtheriseanddevelopmentoflargedata,andthenintroducevariousmainstreamdataminingmethod.Keywords:largedatadataminingmethodofdataanalysis3目录大数据及数据挖掘方法................................................................................................1摘要........................................................................................................................1Abstract..................................................................................................................2目录................................................................................................................................31大数据的缘起............................................................................................................11.1“大数据”的提出...........................................................................................11.2大数据概念、特征及价值..............................................................................21.2.1大数据的概念.......................................................................................21.2.2大数据的特征.......................................................................................31.2.3大数据的价值.......................................................................................41.3大数据形成的必然性......................................................................................51.4大数据发展现状..............................................................................................7(一)政府积极介入推动.................................................................................8(二)资本市场也对大数据钟爱有加.............................................................8(三)人才需求巨大.........................................................................................8(四)国内情况.................................................................................................92大数据的处理...........................................................................................................103数据挖掘方法...........................................................................................................123.1神经网络........................................................................................................123.1.1人工神经网路基本介绍.....................................................................123.1.2设计神经网路结构.............................................................................153.1.3概率式学习.........................................................................................173.1.4神经网路方法优缺点.........................................................................173.2遗传算法........................................................................................................183.2.1遗传算法特点.....................................................................................1843.2.2遗传基本算法.....................................................................................203.2.3遗传算法优缺点.................................................................................233.3决策树方法....................................................................................................243.3.1决策树表示法.....................................................................................243.3.2决策树构造思想.................................................................................253.3.3决策树方法优缺点.............................................................................263.4关联规则.............................................