首先纯属娱乐,多有不严谨之处!!
曼昆的《宏观经济学》里在讲到CPI(消费物价指数)这个指标时提到一个例子,就是说当时《阿凡达》以7.61亿美元成为票房第一名,而在考虑通货膨胀之后,《阿凡达》却降到14名,反而1939年的《乱世佳人》排在第一。
然后今天看见热搜说《长津湖》成中国影史票房冠军,于是就想看看咱们国家的票房排名在考虑通胀后是否会发生一些变化hhh!再次重申,纯属娱乐!
一、内地电影总票房数据票房数据来源于内地票房总榜--艺恩娱数
打开F12,选择网络,选择XHR格式,最后选择响应,如下图!
通过查看响应,可以发现该Post请求返回一个JSON,如果更清楚的看清JSON内部的数据,可以使用网站对JSON进行格式化,如在线JSON校验格式化工具(Be JSON)。通过对JSON格式化后,可以清楚看到该数据包括我们最需要的数据,即电影名、上映时间与累计票房。JSON格式化如下图:
最简单的方式就是将该JSON字符串直接复制到Python代码里,可以发现与网站格式化的效果一模一样。如下图。
此时可以将data变量当作字典类型使用,并使用split函数对上映时间进行处理,让时间字段只包含上映年份。详情如下图,可以看到分别打印出前10个数据的片名、上映年份与票房。
但此时发现了一件非常尴尬的事情,就是这个网站的前端界面只显示前50名的票房。囧!
不过没关系,我们仔细看看返回这个JSON的post,发现了这个奥秘就在这个“top:50”里,所以该请求只给前端界面返回了前50个排名,同时注意到“type:1”,这个是用来区分全部、国产与进口的!读者可以试试,当选择进口的时候,type就会变成2。
运气还不错,使用最简单的Post就可以直接得到返回。从下图可以看出,我们已经成功得到了前500名的数据,跳出了前端界面的限制。写爬虫的时候经常也可以发现这种情况,服务器发给前端的数据,前端只是选择性的显示,比如人人贷的贷款数据,从界面上是无法看见借款人的借款理由,但通过F12可以发现服务器返回的数据是包含借款理由这个字段,只是前端不显示而已。
然后就将这些数据一起存到pandas的Dataframe里,方便后期处理。详细代码与结果如下:
dataDf = pd.Dataframe() for myIndex,each in enumerate(res["data"]["table0"]): dataDf = dataDf.append(pd.Dataframe({"MovieName":each["MovieName"], "ReleaseTime":each["ReleaseTime"].split("-")[0], "BoxOffice":each["BoxOffice"]},index=[myIndex])) dataDf
这时候就可以先把未经过通货膨胀修正时的内地总票房排名画出来!!代码与图见下:
pandas.plot()参考资料:【python】详解pandas.Dataframe.plot( )画图函数_brucewong0516的博客-CSDN博客
#kind:barh 横向条状图 #figsize:图片尺寸大小 #legend=False:不显示图例 #color:设置颜色 #fontsize:设置标签文字大小 #list[::-1] -> 倒序 dataDf[:20][::-1].plot("MovieName","BoxOffice",kind='barh',figsize=(16,8),fontsize=15,legend=False, color=["grey","gold","darkviolet","turquoise","r","g","b","c", "k","darkorange","lightgreen","plum", "tan","khaki", "pink", "skyblue","lawngreen","salmon"])
至此,对电影榜的数据的处理暂时到此,接下来需要添加衡量物价水平的指标。
原本想着说可以使用CPI来进行修正,但看了CPI的数据后,发现CPI却基本非常的稳定。我大为吃惊,明明感觉从小时候到现在,物价总数蹭蹭的涨,为什么CPI却挺稳定。有兴趣的可以搜搜原因,蛮有意思。
那么到底用什么衡量物价能力比较好呢?哈哈我也不知,挺多指标感觉不合适,感觉最合适的就是货币购买力指标,但是没有找到数据。
所以娱乐部分正式开始!!!
关于如何衡量货币购买能力,本文参考如下文章:100元人民币的购买力!70年(1950-2020年)演变史见证国家的发展|通货膨胀|计划经济_网易订阅
这篇文章说,20世纪90年代的100元,约等于2020年的1000元;21世纪00年代的100元相当于2020年的300-400元,取350元;21世纪10年代的100元,约等于2020年的157元。
于是以1990年为基准,物价水平分别为1(1990)、2.86(2000)、6.37(2010)、10(2020)
然后看看排名前500的电影上映时间的分布。如下图,可以看到最早上映时间为2000年,因此我们的物价水平需要从2000年到2021年。
那么问题来咯,我们需要21年的数据,而我们却只有4年的物价水平,那么我们该怎么办呢?
哈哈哈拟合吧,毕竟在娱乐,凑合着玩就行!
从图片来看,4个点分布的还挺直的,用最小二乘法拟合一条直线即可!
最小二乘法代码:
#最小二乘法 OLS def standRegres(xArr,yArr): xMat = mat(xArr) yMat = mat(yArr).T xTx = xMat.T*xMat if linalg.det(xTx) == 0.0: print("singular") return ws = xTx.I * (xMat.T*yMat) # print(ws) return ws
具体拟合情况如下,误差还行:
于是通过以下代码可以算出2000年到2021年各年的物价水平:
priceLevelNew = [] for each in range(10,32): priceLevelNew.append(float(ws[0])+float(ws[1])*each)
于是,现在的思路非常明确,即在第一节中的Df中添加对应年份的物价水平,然后对票房数据进行修正,具体代码与结果如下图,从图中可以看到已经将物价水平匹配到相应的上映年份。
最后我们可以通过物价水平对不同时期的货币进行粗略修正,即:
今年人名币的数量=T年人民币的数量*今年的物价水平/T年的物价水平
根据修正后的票房进行排序,代码与结果如下:
最后我们将修正后的排名画出:
三、最后哈哈尴尬的结论,排名没太大变化。
不过至少有些地方可以改进,比如不止考虑目前票房榜的前500名票房,让电影分布的年份尽量离现在更久远,还有就是改进衡量物价水平的指标hhh!
我就说吧,本贴纯属娱乐,爱信不信!
四、代码import pandas as pd import numpy as np import matplotlib.pyplot as plt from matplotlib import cm from pylab import * import requests plt.rcParams['axes.unicode_minus']=False #用于解决不能显示负号的问题 mpl.rcParams['font.sans-serif'] = ['SimHei'] #最小二乘法 OLS def standRegres(xArr,yArr): xMat = mat(xArr) yMat = mat(yArr).T xTx = xMat.T*xMat if linalg.det(xTx) == 0.0: print("This matrix is singular, cannot do inverse") return ws = xTx.I * (xMat.T*yMat) # print(ws) return ws data = {"status":1,"des":"成功","userstatus":0,"version":0,"data":{"table0":[{"MovieName":"长津湖","AvgAudienceCount":23,"ReleaseTime":"2021-09-30","AvgBoxOffice":47,"BoxOffice":5689581646,"Irank":1,"EnMovieID":703496},{"MovieName":"战狼2","AvgAudienceCount":37,"ReleaseTime":"2017-07-27","AvgBoxOffice":36,"BoxOffice":5688740633,"Irank":2,"EnMovieID":641515},{"MovieName":"你好,李焕英","AvgAudienceCount":24,"ReleaseTime":"2021-02-12","AvgBoxOffice":45,"BoxOffice":5413303171,"Irank":3,"EnMovieID":662746},{"MovieName":"哪吒之魔童降世","AvgAudienceCount":23,"ReleaseTime":"2019-07-26","AvgBoxOffice":36,"BoxOffice":5035020595,"Irank":4,"EnMovieID":662685},{"MovieName":"流浪地球","AvgAudienceCount":29,"ReleaseTime":"2019-02-05","AvgBoxOffice":45,"BoxOffice":4686808164,"Irank":5,"EnMovieID":642412},{"MovieName":"唐人街探案3","AvgAudienceCount":29,"ReleaseTime":"2021-02-12","AvgBoxOffice":48,"BoxOffice":4522345605,"Irank":6,"EnMovieID":676314},{"MovieName":"复仇者联盟4:终局之战","AvgAudienceCount":23,"ReleaseTime":"2019-04-24","AvgBoxOffice":49,"BoxOffice":4250383910,"Irank":7,"EnMovieID":670808},{"MovieName":"红海行动","AvgAudienceCount":33,"ReleaseTime":"2018-02-16","AvgBoxOffice":39,"BoxOffice":3651886398,"Irank":8,"EnMovieID":655823},{"MovieName":"唐人街探案2","AvgAudienceCount":39,"ReleaseTime":"2018-02-16","AvgBoxOffice":39,"BoxOffice":3397688097,"Irank":9,"EnMovieID":663419},{"MovieName":"美人鱼","AvgAudienceCount":43,"ReleaseTime":"2016-02-08","AvgBoxOffice":37,"BoxOffice":3397175023,"Irank":10,"EnMovieID":626153},{"MovieName":"我和我的祖国","AvgAudienceCount":35,"ReleaseTime":"2019-09-30","AvgBoxOffice":38,"BoxOffice":3176119334,"Irank":11,"EnMovieID":691481},{"MovieName":"八佰","AvgAudienceCount":20,"ReleaseTime":"2020-08-21","AvgBoxOffice":38,"BoxOffice":3102323734,"Irank":12,"EnMovieID":669412},{"MovieName":"我不是药神","AvgAudienceCount":27,"ReleaseTime":"2018-07-05","AvgBoxOffice":35,"BoxOffice":3099961063,"Irank":13,"EnMovieID":676313},{"MovieName":"中国机长","AvgAudienceCount":26,"ReleaseTime":"2019-09-30","AvgBoxOffice":37,"BoxOffice":2913117677,"Irank":14,"EnMovieID":681319},{"MovieName":"我和我的家乡","AvgAudienceCount":19,"ReleaseTime":"2020-10-01","AvgBoxOffice":39,"BoxOffice":2828832552,"Irank":15,"EnMovieID":701620},{"MovieName":"速度与激情8","AvgAudienceCount":30,"ReleaseTime":"2017-04-14","AvgBoxOffice":37,"BoxOffice":2670959285,"Irank":16,"EnMovieID":659757},{"MovieName":"西虹市首富","AvgAudienceCount":28,"ReleaseTime":"2018-07-27","AvgBoxOffice":35,"BoxOffice":2547571742,"Irank":17,"EnMovieID":671983},{"MovieName":"捉妖记","AvgAudienceCount":41,"ReleaseTime":"2015-07-16","AvgBoxOffice":37,"BoxOffice":2441462276,"Irank":18,"EnMovieID":627896},{"MovieName":"速度与激情7","AvgAudienceCount":42,"ReleaseTime":"2015-04-12","AvgBoxOffice":39,"BoxOffice":2426586547,"Irank":19,"EnMovieID":629625},{"MovieName":"复仇者联盟3:无限战争","AvgAudienceCount":19,"ReleaseTime":"2018-05-11","AvgBoxOffice":38,"BoxOffice":2390537273,"Irank":20,"EnMovieID":675789},{"MovieName":"捉妖记2","AvgAudienceCount":44,"ReleaseTime":"2018-02-16","AvgBoxOffice":38,"BoxOffice":2237154621,"Irank":21,"EnMovieID":656875},{"MovieName":"疯狂的外星人","AvgAudienceCount":30,"ReleaseTime":"2019-02-05","AvgBoxOffice":42,"BoxOffice":2214254201,"Irank":22,"EnMovieID":638300},{"MovieName":"羞羞的铁拳","AvgAudienceCount":25,"ReleaseTime":"2017-09-30","AvgBoxOffice":33,"BoxOffice":2201748735,"Irank":23,"EnMovieID":661004},{"MovieName":"海王","AvgAudienceCount":18,"ReleaseTime":"2018-12-07","AvgBoxOffice":36,"BoxOffice":2013198359,"Irank":24,"EnMovieID":665526},{"MovieName":"变形金刚4:绝迹重生","AvgAudienceCount":50,"ReleaseTime":"2014-06-27","AvgBoxOffice":42,"BoxOffice":1977522388,"Irank":25,"EnMovieID":612232},{"MovieName":"前任3:再见前任","AvgAudienceCount":29,"ReleaseTime":"2017-12-29","AvgBoxOffice":35,"BoxOffice":1941740154,"Irank":26,"EnMovieID":663359},{"MovieName":"毒液:致命守护者","AvgAudienceCount":17,"ReleaseTime":"2018-11-09","AvgBoxOffice":36,"BoxOffice":1870680440,"Irank":27,"EnMovieID":662209},{"MovieName":"功夫瑜伽","AvgAudienceCount":33,"ReleaseTime":"2017-01-28","AvgBoxOffice":38,"BoxOffice":1752603744,"Irank":28,"EnMovieID":629898},{"MovieName":"飞驰人生","AvgAudienceCount":25,"ReleaseTime":"2019-02-05","AvgBoxOffice":42,"BoxOffice":1729373180,"Irank":29,"EnMovieID":676018},{"MovieName":"烈火英雄","AvgAudienceCount":19,"ReleaseTime":"2019-08-01","AvgBoxOffice":36,"BoxOffice":1707188998,"Irank":30,"EnMovieID":692321},{"MovieName":"侏罗纪世界2","AvgAudienceCount":19,"ReleaseTime":"2018-06-15","AvgBoxOffice":36,"BoxOffice":1695881571,"Irank":31,"EnMovieID":667168},{"MovieName":"寻龙诀","AvgAudienceCount":40,"ReleaseTime":"2015-12-18","AvgBoxOffice":36,"BoxOffice":1682742863,"Irank":32,"EnMovieID":614981},{"MovieName":"西游伏妖篇","AvgAudienceCount":36,"ReleaseTime":"2017-01-28","AvgBoxOffice":39,"BoxOffice":1655926405,"Irank":33,"EnMovieID":619719},{"MovieName":"港囧","AvgAudienceCount":40,"ReleaseTime":"2015-09-25","AvgBoxOffice":33,"BoxOffice":1614103585,"Irank":34,"EnMovieID":618038},{"MovieName":"姜子牙","AvgAudienceCount":19,"ReleaseTime":"2020-10-01","AvgBoxOffice":40,"BoxOffice":1602983421,"Irank":35,"EnMovieID":682630},{"MovieName":"少年的你","AvgAudienceCount":16,"ReleaseTime":"2019-10-25","AvgBoxOffice":36,"BoxOffice":1559025893,"Irank":36,"EnMovieID":680681},{"MovieName":"变形金刚5:最后的骑士","AvgAudienceCount":23,"ReleaseTime":"2017-06-23","AvgBoxOffice":37,"BoxOffice":1551242789,"Irank":37,"EnMovieID":656946},{"MovieName":"疯狂动物城","AvgAudienceCount":28,"ReleaseTime":"2016-03-04","AvgBoxOffice":34,"BoxOffice":1534528494,"Irank":38,"EnMovieID":643235},{"MovieName":"我和我的父辈","AvgAudienceCount":16,"ReleaseTime":"2021-09-30","AvgBoxOffice":43,"BoxOffice":1474411166,"Irank":39,"EnMovieID":706356},{"MovieName":"魔兽","AvgAudienceCount":25,"ReleaseTime":"2016-06-08","AvgBoxOffice":37,"BoxOffice":1472297906,"Irank":40,"EnMovieID":402117},{"MovieName":"复仇者联盟2:奥创纪元","AvgAudienceCount":29,"ReleaseTime":"2015-05-12","AvgBoxOffice":40,"BoxOffice":1464392888,"Irank":41,"EnMovieID":631792},{"MovieName":"夏洛特烦恼","AvgAudienceCount":33,"ReleaseTime":"2015-09-30","AvgBoxOffice":32,"BoxOffice":1447823756,"Irank":42,"EnMovieID":628183},{"MovieName":"速度与激情:特别行动","AvgAudienceCount":15,"ReleaseTime":"2019-08-23","AvgBoxOffice":36,"BoxOffice":1434299899,"Irank":43,"EnMovieID":682202},{"MovieName":"送你一朵小红花","AvgAudienceCount":12,"ReleaseTime":"2020-12-31","AvgBoxOffice":37,"BoxOffice":1432524430,"Irank":44,"EnMovieID":701874},{"MovieName":"芳华","AvgAudienceCount":25,"ReleaseTime":"2017-12-15","AvgBoxOffice":34,"BoxOffice":1422584326,"Irank":45,"EnMovieID":659453},{"MovieName":"侏罗纪世界","AvgAudienceCount":33,"ReleaseTime":"2015-06-10","AvgBoxOffice":38,"BoxOffice":1420732578,"Irank":46,"EnMovieID":348959},{"MovieName":"蜘蛛侠:英雄远征","AvgAudienceCount":17,"ReleaseTime":"2019-06-28","AvgBoxOffice":36,"BoxOffice":1417682748,"Irank":47,"EnMovieID":682139},{"MovieName":"头号玩家","AvgAudienceCount":18,"ReleaseTime":"2018-03-30","AvgBoxOffice":36,"BoxOffice":1396660613,"Irank":48,"EnMovieID":657862},{"MovieName":"速度与激情9","AvgAudienceCount":13,"ReleaseTime":"2021-05-21","AvgBoxOffice":39,"BoxOffice":1392333894,"Irank":49,"EnMovieID":682199},{"MovieName":"后来的我们","AvgAudienceCount":21,"ReleaseTime":"2018-04-28","AvgBoxOffice":34,"BoxOffice":1361525311,"Irank":50,"EnMovieID":663327}]}} url = "https://ys.endata.cn/enlib-api/api/home/getrank_mainland.do" myData = {'top':'500','type':'1'} res = requests.post(url,data=myData) res = res.json() print(len(res["data"]["table0"])) dataDf = pd.Dataframe() for myIndex,each in enumerate(res["data"]["table0"]): dataDf = dataDf.append(pd.Dataframe({"MovieName":each["MovieName"], "ReleaseTime":each["ReleaseTime"].split("-")[0], "BoxOffice":each["BoxOffice"]},index=[myIndex])) #kind:barh 横向条状图 #figsize:图片尺寸大小 #legend=False:不显示图例 #color:设置颜色 #fontsize:设置标签文字大小 #list[::-1] -> 倒序 dataDf[:20][::-1].plot("MovieName","BoxOffice",kind='barh',figsize=(16,8),fontsize=15,legend=False, color=["grey","gold","darkviolet","turquoise","r","g","b","c", "k","darkorange","lightgreen","plum", "tan","khaki", "pink", "skyblue","lawngreen","salmon"]) plt.savefig("before",dpi=500) print(dataDf["ReleaseTime"].value_counts()) priceLevel = np.array([1/1000,1/350,1/157,1/100])*1000 plt.scatter([1990,2000,2010,2020],priceLevel) ws = standRegres([[1,0],[1,10],[1,20],[1,30]],priceLevel) plt.scatter([0,10,20,30],priceLevel,color="r") x = np.linspace(0,30) y = float(ws[0])+float(ws[1])*x plt.xticks([0,10,20,30],[1990,2000,2010,2020]) plt.plot(x,y) priceLevelNew = [] for each in range(10,32): priceLevelNew.append(float(ws[0])+float(ws[1])*each) for eachIndex in dataDf.index: dataDf.loc[eachIndex,"priceLevel"] = priceLevelNew[int(dataDf.iloc[eachIndex]["ReleaseTime"])-2000] dataDf["newBoxOffice"] = dataDf["BoxOffice"]*9.938608/dataDf["priceLevel"] dataDf = dataDf.sort_values(by="newBoxOffice",ascending=False) dataDf dataDf[:20][::-1].plot("MovieName","newBoxOffice",kind='barh',figsize=(16,8),fontsize=15,legend=False, color=["grey","gold","darkviolet","turquoise","r","g","b","c", "k","darkorange","lightgreen","plum", "tan","khaki", "pink", "skyblue","lawngreen","salmon"]) plt.savefig("after",dpi=500)
欢迎分享,转载请注明来源:内存溢出
评论列表(0条)