想要理解和研究机器学习,首先你应该要掌握Python或者R,都是和C,Java,PHP差不多的语言(译:差太多了好吧).不过呢,Python和R都是比较年轻(译:不懂,Python可并不年轻吧),而且呢更高级,完全不用理解底层(译:?),所以他俩都很容易学.Python更牛逼的地方在于她能够处理更多的问题,比如,机器学习,算法,图像等,而不像R只能是进行数据处理和分析.Python有着更广泛的应用领域,比如后端框架Django(译:原文是,'Hostingwebsites:Jango'),自然语言处理(译:原文是,'naturallanguageproecssing',作者太不认真,NLP),网站接入等,而且Python更像C语言(译:扯淡),所以她现在很流行.毛子的原文里面有不少错误,我以自己的理解加以修正,仅供参考.语法文法错误我就直接修改,原文作者的表达内容错误会依据原文不变,在()内说明.新手用Python进行机器学习的四个步骤Python基础知识学习,有书,Mooc,视频.处理数据,你得了解一些模块,如:Pandas,Numpy,Matplotlib和NaturalLanguageProcessing.接着你就得爬取数据,可以通过API,也可以直接到网站上去爬取.网站爬虫模块:BeautifulSoup(译:应该是Scrapy,BS是HTML/XML解析器).我们用拿到的数据来训练算法.最后一步,就是要学习ML的相关算法,以及工具Scikit-learn.1.学习Python学习Python最简单粗暴的法子就是到Codecademy上去注册个账号来学习基础知识.一个被好多码农推荐的很经典的网站LearnPythonTheHardWay.ByteofPython这篇文章是非常值得去学习的.Python社区还为新手给出了一个Python学习资源列表.O’Reilley出版的一本书ThinkPython,这里可以免费下载.最后还有一个IntroductiontoPythonforEconometrics,StatisticsandDataAnalysis也讲了好多Python的基础知识.2.导入模块做机器学习很重要的几个模块和工具是NumPy,Pandas,Matplotlib和IPython.DataAnalysiswithOpenSourceTools这本书里面都有涉及这些内容.上面提到的IntroductiontoPythonforEconometrics,StatisticsandDataAnalysis也涵盖了这些东西.还有一本书PythonforDataAnalysis:DataWranglingwithPandas,NumPy,andIPython.下面还有一些免费的资源:10minutestoPandasPandasformachinelearning100NumPyexercises3.爬取挖掘数据一旦你掌握了Python的基础,下面就要学会怎么去爬取数据.也就是网页爬虫.像Twitter和LinkedIn这些网站都给出了APIs接口,让我们去获得文本数据.关于这方面下面有几本书不错的书:MiningtheSocialWeb(免费),WebScrapingwithPython和WebScrapingwithPython:CollectingDatafromtheModernWeb.最后这些文本数据要由NLP技术处理成数值化数据:NaturallanguageprocessingwithPython.图像和视频要用图像处理CV,下面有几个不错的资源:ProgrammingComputerVisionwithPython(免费),ProgrammingComputerVisionwithPython:Toolsandalgorithmsforanalyzingimages和PracticalPythonandOpenCV.Python爬虫的一些例子:Mini-Tutorial:SavingTweetstoaDatabasewithPythonWebScrapingIndeedforKeyDataScienceJobSkillsCaseStudy:SentimentAnalysisOnMovieReviewsFirstWebScraperSentimentAnalysisofEmailsSimpleTextClassificationBasicSentimentAnalysiswithPythonTwittersentimentanalysisusingPythonandNLTKSecondTry:SentimentAnalysisinPythonNaturalLanguageProcessinginaKaggleCompetitionforMovieReviews4.机器学习机器学习可以分为四部分:分类,聚类,回归和降维.MachinelearninginPythonScikit-learn官网上有很多指南,下面列一些其它的:IntroductiontoMachineLearningwithPythonandScikit-LearnDataScienceinPythonMachineLearningforPredictingBadLoansAGenericArchitectureforTextClassificationwithMachineLearningUsingPythonandAItopredicttypesofwineAdviceforapplyingMachineLearningPredictingcustomerchurnwithscikit-learnMappingYourMusicCollectionDataScienceinPythonCaseStudy:SentimentAnalysisonMovieReviewsDocumentClusteringwithPythonFivemostpopularsimilaritymeasuresimplementationinpythonCaseStudy:SentimentAnalysisonMovieReviewsWillitPython?TextProcessinginMachineLearningHackinganepicNHLgoalcelebrationwithahuelightshowandreal-timemachinelearningVancouverRoomPricesExploringandPredictingUniversityFacultySalariesPredictingAirlineDelays书:CollectionofbooksonredditBuildingMachineLearningSystemswithPythonBuildingMachineLearningSystemswithPython,2ndEditionLearningscikit-learn:MachineLearninginPythonMachineLearningAlgorithmicPerspectiveDataSciencefromScratch–FirstPrincipleswithPythonMachineLearninginPython机器学习相关的Blog和课程在线课程:Collectionoflinks.MOOC:machinelearning和DataAnalystNanodegree.这里是一些Blog.机器学习理论TheElementsofstatisticalLearningIntroductiontoStatisticalLearning书:IntroductiontomachinelearningACourseinMachineLearning.还有一些Watch15hourstheoryofmachinelearning!越看越懒得翻,着实没什么营养,索性直接列出资源.下面是美国麻省理工学院(MIT)博士林达华老师(ML大牛)推荐的书单.MachineLearningPatternRecognitionandMachineLearningByChristopherM.BishopAnewtreatmentofclassicmachinelearningtopics,suchasclassification,regression,andtimeseriesanalysisfromaBayesianperspective.ItisamustreadforpeoplewhointendstoperformresearchonBayesianlearningandprobabilisticinference.GraphicalModels,ExponentialFamilies,andVariationalInferenceByMartinJ.WainwrightandMichaelI.JordanItisacomprehensiveandbrilliantpresentationofthreecloselyrelatedsubjects:graphicalmodels,exponentialfamilies,andvariationalinference.ThisisthebestmanuscriptthatIhaveeverreadonthissubject.Stronglyrecommendedtoeveryoneinterestedingraphicalmodels.Theconnectionsbetweenvariousinferencealgorithmsandconvexoptimizationisclearlyexplained.Note:pdfversionofthisbookisfreelyavailableonline.BigData:ARevolutionThatWillTransformHowWeLive,Work,andThinkViktorMayer-Schonberger,andKennethCukierAshortbutinsightfulmanuscriptthatwillmotivateyoutorethinkhowweshouldfacetheexplosivegrowthofdatainthenewcentury.StatisticalPatternRecognition(2nd/3rdEdition)ByAndrewR.Webb,andKeithD.CopseyAwellwrittenbookonpatternrecognitionforbeginners.Itcoversbasictopicsinthisfield,includingdiscriminantanalysis,decisiontrees,featureselection,andclustering--allarebasicknowledgethatresearchersinmachinelearningorpatternrecognitionshouldunderstand.LearningwithKernels:SupportVectorMachines,Regularization,Optimization,andBeyondByBernhardSchlkopfandAlexanderJ.SmolaAcomprehensiveandin-depthtreatmentofkernelmethodsandsupportvectormachine.Itnotonlyclearlydevelopsthemathematicalfoundation,namelythereproducingkernelHi