5_Python基本数据统计(PDF52页)

整理文档很辛苦,赏杯茶钱您下走!

免费阅读已结束,点击下载阅读编辑剩下 ...

阅读已结束,您可以下载文档离线阅读编辑

资源描述

Python基本数据BasicdataprocessingofPython统计DepartmentofComputerScienceandTechnologyDepartmentofUniversityBasicComputerTeachingNanjingUniversity数据分析4数据描述3数据整理数据收集12简单数据处理过程2NanjingUniversity便捷数据获取用Python玩转数据NanjingUniversity用Python获取数据本地数据如何获取?文件的打开,读写和关闭•文件打开•读文件写文件•文件关闭44NanjingUniversity用Python获取数据网络数据如何获取(爬取)?抓取网页,解析网页内容•抓取•urllib内建模块–urllib.request•Requests第三方库•Scrapy框架•解析•BeautifulSoup库•re模块14NanjingUniversity道指成分股数据6djiquotesNanjingUniversity数据形式7djidfquotesdfNanjingUniversity便捷网络数据获取是否能够简单方便并且快速的方式获得财经网站上公司股票的历史数据?#Filename:quotes_fromcsv.pyimportpandasaspdquotesdf=pd.read_csv('axp.csv')print(quotesdf)File8NanjingUniversity便捷网络数据获取9r=requests.get('')r.text'{rating:{max:10,numRaters:218148,average:9.0,min:0},subtitle:,author:[[法]圣埃克苏佩里],pubdate:2003-8,tags:[{count:52078,name:小王子,title:小王子},{count:43966,name:童话,…,price:22.00元}'SourceNanjingUniversityNLTK语料库古腾堡gutenberg网络和聊天文本webtext就职演说inaugural布朗brown路透社reuters其他语言–多国语言自定义的语料库10NanjingUniversity便捷网络数据fromnltk.corpusimportgutenbergimportnltkprint(gutenberg.fileids())['austen-emma.txt','austen-persuasion.txt','austen-sense.txt','bible-kjv.txt','blake-poems.txt','bryant-stories.txt','burgess-busterbrown.txt','carroll-alice.txt','chesterton-ball.txt','chesterton-brown.txt','chesterton-thursday.txt','edgeworth-parents.txt','melville-moby_dick.txt','milton-paradise.txt','shakespeare-caesar.txt','shakespeare-hamlet.txt','shakespeare-macbeth.txt','whitman-leaves.txt']texts=gutenberg.words('shakespeare-hamlet.txt')print(texts)['[','The','Tragedie','of','Hamlet','by',...]Sourcebrown11NanjingUniversity数据准备用Python玩转数据NanjingUniversity数据形式30支道指成分股(dji)股票数据的逻辑结构公司代码公司名最近一次成交价美国运通公司(quotes)股票历史数据的逻辑结构收盘价日期最高价最低价开盘价成交量13NanjingUniversity数据整理djidf加列索引(columns)#Filename:stock.pyimportrequestsimportreimportpandasaspddefretrieve_dji_list():…returndji_listdji_list=retrieve_dji_list()djidf=pd.DataFrame(dji_list)cols=['code','name','lasttrade']djidf.columns=colsprint(quotesdf)File14NanjingUniversity数据整理djidf数据:加完columns的形式codenamelasttradeMMMAXPAAPL…WMTquotesdf数据:原始数据中已有columnsclosedatehighlowopenvolume146401020014640966001464183000…149520060015NanjingUniversity数据整理用1,2,…作为index(行索引)quotesdf=pd.DataFrame(quotes)quotesdf.index=range(1,len(quotes)+1)16NanjingUniversity数据整理如果可以直接用date作为索引,quotes的时间能否转换成普通日期形式(如下图中的效果)?fromdatetimeimportdatefirstday=date.fromtimestamp(1464010200)lastday=date.fromtimestamp(1495200600)firstdaydatetime.date(2016,5,23)lastdaydatetime.date(2017,5,19)Source171464010200NanjingUniversity时间序列#Filename:quotes_history_v2.pydefretrieve_quotes_historical(stock_code):…return[itemforiteminquotesifnot'type'initem]quotes=retrieve_quotes_historical('AXP')list1=[]foriinrange(len(quotes)):x=date.fromtimestamp(quotes[i]['date'])y=date.strftime(x,'%Y-%m-%d')list1.append(y)quotesdf_ori=pd.DataFrame(quotes,index=list1)quotesdf_m=quotesdf_ori.drop(['unadjclose'],axis=1)quotesdf=quotesdf_m.drop(['date'],axis=1)print(quotesdf)File转换成常规时间转换成固定格式删除原date列18删除原unadjclose列NanjingUniversity创建时间序列importpandasaspddates=pd.date_range('20170520',periods=7)datesclass'pandas.tseries.index.DatetimeIndex'[2017-05-20,...,2017-05-26]Length:7,Freq:D,Timezone:Noneimportnumpyasnpdatesdf=pd.DataFrame(np.random.randn(7,3),index=dates,columns=list('ABC'))datesdfABC2017-05-201.302600-1.2147081.4116282017-05-21-0.5123432.2774740.4038112017-05-22-0.788498-0.2171610.1732842017-05-231.042167-0.453329-2.1071632017-05-24-1.6280751.6633770.9435822017-05-25-0.0910340.3358842.4554312017-05-26-0.679055-0.8659730.246970Source19NanjingUniversity数据显示用Python玩转数据NanjingUniversity数据显示djidfquotesdf21NanjingUniversity数据显示list(djidf.index)[0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29]list(djidf.columns)['code','name','lasttrade']dijdf.valuesarray([['MMM','3M',195.8],...,['WMT','Wal-Mart',78.77]],dtype=object)djidf.describeboundmethodNDFrame.describeofcodenamelasttrade0MM3M195.80...,29WMTWal-Mart78.77Source显示方式:•显示行索引•显示列索引•显示数据的值•显示数据描述22NanjingUniversity数据显示djidf.lasttrade1199.54277.443153.87…3078.31Name:lasttrade,dtype:float64Source数据的格式23dji_list=[]foritemindji_list_in_text:dji_list.append([item[0],item[1],float(item[2])])NanjingUniversity数据显示djidf.head(5)codenamelasttrade0MMM3M195.801AXPAmericanExpress76.802AAPLApple153.063BABoeing180.764CATCaterpillar102.43djidf.tail(5)codenamelasttrade25UTXUnitedTechnologies121.1626UNHUnitedHealth172.5927VZVerizon45.4228VVisa92.4829WMTWal-Mart78.77Sourcedjidf[:5]djidf[-5:]显示方式:•显示行−专用方式−切片−显示列查看道指成分股中前5只和后5只的股票基本信息?24NanjingUniversity数据选择用Python玩转数据NanjingUniversity数据选择选择方式:•选择行•选择列•选择区域•筛选(条件选择)26NanjingUniversity数据选择quotesdf['2017-05-01':'2017-05-05']closehighlowopenvolume2017-05-0179.23000379.48999878.87999779.22000134581002017-05-0279.54000179.66000479.15000279.15000233349002017-05-0378.83000279.51000278.69000279.23000338006002017-05-0478.33000279.41999877.98999879.23000339022002017-05-0578.32000078.73000377.87999778.610

1 / 52
下载文档,编辑使用

©2015-2020 m.777doc.com 三七文档.

备案号:鲁ICP备2024069028号-1 客服联系 QQ:2149211541

×
保存成功