Python——QQ音乐流行、飙升榜数据可视化分析

Python——QQ音乐流行、飙升榜数据可视化分析,第1张

概述一、选题背景现如今,音乐伴随着我们的生活。已经成为了我们的生活习惯一部分,听音乐的场景有很多,例如做作业、上班出勤的路上、午休小寝时……为此我们都会对最近新出的流行歌曲和热度飙升比较快的歌曲感兴趣。为此我选这个主题作为课题。二、爬虫方案设计名称:QQ音乐流行、飙 一、选题背景

  现如今,音乐伴随着我们的生活。已经成为了我们的生活习惯一部分,听音乐的场景有很多,例如做作业、上班出勤的路上、午休小寝时……为此我们都会对最近新出的流行歌曲和热度飙升比较快的歌曲感兴趣。为此我选这个主题作为课题。

二、爬虫方案设计

名称:QQ音乐流行、飙升排行榜数据爬取

内容:通过访问QQ音乐的web官网,爬取相对应榜单的信息。最后保存下来做可视化分析。

设计方案思路:

  首先,用request进行访问页面。

  其次,用xtree来获取页面内容。

  最后,文件 *** 作进行数据的保存。

技术难点:在于分排行榜单进行爬取,工程量较大。

三、页面结构特征分析

页面分析:

内容导航型

 

 

爬取目标特征分析(HTMLs分析):

排名、热度、歌曲时间:

 

 

歌曲名:

 

歌手:

 

 

 节点查找方法:

 

 

 

QQ_muc_pop = HTML.xpath("//*[@ID='app']/div/div[2]/div[2]/div[3]/ul[2]/li[{}]/div/div[1]/text()".format(pop))QQ_muc_up = HTML.xpath("//*[@ID='app']/div/div[2]/div[2]/div[3]/ul[2]/li[{}]/div/div[2]/text()".format(pop))QQ_muc_name = HTML.xpath("//*[@ID='app']/div/div[2]/div[2]/div[3]/ul[2]/li[{}]/div/div[3]/span/a[2]/text()".format(pop))QQ_muc_singer = HTML.xpath("//*[@ID='app']/div/div[2]/div[2]/div[3]/ul[2]/li[{}]/div/div[4]/a/text()".format(pop))QQ_muc_time = HTML.xpath("//*[@ID='app']/div/div[2]/div[2]/div[3]/ul[2]/li[{}]/div/div[5]/text()".format(pop))

遍历方法:通过for循环控制标签导航,进行遍历筛出

 四、爬虫程序设计

数据爬取与采集:

  1 import  requests  2 from bs4 import BeautifulSoup  3 import time  4 import random  5 import sys  6 import re  7 from tqdm import tqdm  8 from lxml import etree  9  10 USER_AGENTS = [ 11     'Mozilla/5.0 (windows NT 6.2; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1667.0 Safari/537.36' 12     'Mozilla/5.0 (windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/31.0.1650.16 Safari/537.36' 13     'Mozilla/5.0 (windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/31.0.1623.0 Safari/537.36' 14     'Mozilla/5.0 (windows NT 6.2; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/30.0.1599.17 Safari/537.36' 15     'Mozilla/5.0 (windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/29.0.1547.62 Safari/537.36' 16     'Mozilla/5.0 (windows NT 6.2; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/29.0.1547.2 Safari/537.36' 17     'Mozilla/5.0 (windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1468.0 Safari/537.36' 18     'Mozilla/5.0 (windows NT 6.2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1467.0 Safari/537.36' 19     'Mozilla/5.0 (windows NT 6.2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1464.0 Safari/537.36' 20     'Mozilla/5.0 (windows NT 6.2; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/27.0.1500.55 Safari/537.36' 21     'Mozilla/5.0 (windows NT 6.2; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/27.0.1453.93 Safari/537.36' 22     'Mozilla/5.0 (windows NT 6.1; rv:27.3) Gecko/20130101 firefox/27.3' 23     'Mozilla/5.0 (windows NT 6.2; Win64; x64; rv:27.0) Gecko/20121011 firefox/27.0' 24     'Mozilla/5.0 (windows NT 6.1; Win64; x64; rv:25.0) Gecko/20100101 firefox/25.0' 25     'Mozilla/5.0 (windows NT 6.0; WOW64; rv:24.0) Gecko/20100101 firefox/24.0' 26     'Mozilla/5.0 (windows NT 6.2; rv:22.0) Gecko/20130405 firefox/23.0' 27     'Mozilla/5.0 (windows NT 6.1; WOW64; rv:23.0) Gecko/20130406 firefox/23.0' 28     'Mozilla/5.0 (windows NT 6.1; Win64; x64; rv:23.0) Gecko/20131011 firefox/23.0' 29     'Mozilla/5.0 (windows NT 6.2; rv:22.0) Gecko/20130405 firefox/22.0' 30     'Mozilla/5.0 (windows NT 6.1; Win64; x64; rv:22.0) Gecko/20130328 firefox/22.0' 31     'Mozilla/5.0 (windows NT 6.1; rv:22.0) Gecko/20130405 firefox/22.0' 32     'Mozilla/5.0 (Microsoft windows NT 6.2.9200.0); rv:22.0) Gecko/20130405 firefox/22.0' 33     'Mozilla/5.0 (windows NT 6.2; Win64; x64; rv:16.0.1) Gecko/20121011 firefox/21.0.1' 34     'Mozilla/5.0 (windows NT 6.1; Win64; x64; rv:16.0.1) Gecko/20121011 firefox/21.0.1' 35     'Mozilla/5.0 (windows NT 6.2; Win64; x64; rv:21.0.0) Gecko/20121011 firefox/21.0.0' 36     'Mozilla/5.0 (windows NT 6.2; WOW64; rv:21.0) Gecko/20130514 firefox/21.0' 37 ] 38  39 headers = { 40     'User-Agent':random.choice(USER_AGENTS), 41     # 'User-Agent':'Mozilla/5.0 (windows NT 10.0; Win64; x64; rv:89.0) Gecko/20100101 firefox/89.0', 42     'Connection':'keep-alive', 43     'Accept-Language':'zh-CN,zh;q=0.8,zh-TW;q=0.7,zh-HK;q=0.5,en-US;q=0.3,en;q=0.2' 44     } 45  46 # QQ音乐飙升榜 47 def QQ_muc_up(): 48     # 请求访问 49     url = 'https://y.qq.com/n/ryqq/topList/62' 50     res = requests.get(url,headers=headers) 51     res.enCoding = 'utf-8' 52     # print(res) 53     HTML = etree.HTML(res.text) 54     # print(HTML) 55     # 创建保存文件 56     # 创建文件 57     file = open("QQ_muc_up.csv", "a") 58     file.write( 59         "QQ_muc_pop" + "," + "QQ_muc_up" + "," + "QQ_muc_name" + "," + "QQ_muc_singer" + "," + "QQ_muc_time" + '\n') 60     file = file.close() 61     # 排名QQ_muc_pop、飙升指数QQ_muc_up、歌名QQ_muc_name、歌手QQ_muc_singer、歌曲时间QQ_muc_time 62     pop = 1 63     for i in range(1,21): 64         QQ_muc_pop = HTML.xpath("//*[@ID='app']/div/div[2]/div[2]/div[3]/ul[2]/li[{}]/div/div[1]/text()".format(pop)) 65         for item in QQ_muc_pop: 66             QQ_muc_pop = item 67         QQ_muc_up = HTML.xpath("//*[@ID='app']/div/div[2]/div[2]/div[3]/ul[2]/li[{}]/div/div[2]/text()".format(pop)) 68         for item in QQ_muc_up: 69             QQ_muc_up = item.strip('%') 70             QQ_muc_up = int(QQ_muc_up) 71         QQ_muc_name = HTML.xpath("//*[@ID='app']/div/div[2]/div[2]/div[3]/ul[2]/li[{}]/div/div[3]/span/a[2]/text()".format(pop)) 72         for item in QQ_muc_name: 73             QQ_muc_name = item 74         QQ_muc_singer = HTML.xpath("//*[@ID='app']/div/div[2]/div[2]/div[3]/ul[2]/li[{}]/div/div[4]/a/text()".format(pop)) 75         for item in QQ_muc_singer: 76             QQ_muc_singer = item 77         QQ_muc_time = HTML.xpath("//*[@ID='app']/div/div[2]/div[2]/div[3]/ul[2]/li[{}]/div/div[5]/text()".format(pop)) 78         for item in  QQ_muc_time: 79             QQ_muc_time = item 80         pop += 1 81         # 保存数据 82         with open('QQ_muc_up.csv', "a", enCoding='utf-8') as file1: 83             file1.writelines(QQ_muc_pop + "," + str(QQ_muc_up) + "," + QQ_muc_name + "," + QQ_muc_singer + "," + QQ_muc_time + '\n') 84         print('歌名:',QQ_muc_name,'\n','排名:',QQ_muc_pop,'\n','飙升指数:',QQ_muc_up,'\n','歌手名:',QQ_muc_singer,'\n','时长',QQ_muc_time) 85  86 #QQ音乐流行榜 87 def QQ_muc_fasion(): 88     # 请求访问 89     url = 'https://y.qq.com/n/ryqq/topList/4' 90     res = requests.get(url,headers=headers) 91     res.enCoding = 'utf-8' 92     # print(res) 93     HTML = etree.HTML(res.text) 94     # print(HTML) 95     # 创建保存文件 96     # 创建文件 97     file = open("QQ_muc_fasion.csv", "a") 98     file.write( 99         "QQ_muc_pop" + "," + "QQ_muc_up" + "," + "QQ_muc_name" + "," + "QQ_muc_singer" + "," + "QQ_muc_time" + '\n')100     file = file.close()101     # 排名QQ_muc_pop、飙升指数QQ_muc_up、歌名QQ_muc_name、歌手QQ_muc_singer、歌曲时间QQ_muc_time102     pop = 1103     for i in range(1,21):104         QQ_muc_pop = HTML.xpath("//*[@ID='app']/div/div[2]/div[2]/div[3]/ul[2]/li[{}]/div/div[1]/text()".format(pop))105         for item in QQ_muc_pop:106             QQ_muc_pop = item107         QQ_muc_up = HTML.xpath("//*[@ID='app']/div/div[2]/div[2]/div[3]/ul[2]/li[{}]/div/div[2]/text()".format(pop))108         for item in QQ_muc_up:109             QQ_muc_up = item.strip('%')110             QQ_muc_up = int(QQ_muc_up)111         QQ_muc_name = HTML.xpath("//*[@ID='app']/div/div[2]/div[2]/div[3]/ul[2]/li[{}]/div/div[3]/span/a[2]/text()".format(pop))112         for item in QQ_muc_name:113             QQ_muc_name = item114         QQ_muc_singer = HTML.xpath("//*[@ID='app']/div/div[2]/div[2]/div[3]/ul[2]/li[{}]/div/div[4]/a/text()".format(pop))115         for item in QQ_muc_singer:116             QQ_muc_singer = item117         QQ_muc_time = HTML.xpath("//*[@ID='app']/div/div[2]/div[2]/div[3]/ul[2]/li[{}]/div/div[5]/text()".format(pop))118         for item in  QQ_muc_time:119             QQ_muc_time = item120         pop += 1121         # 保存数据122         with open('QQ_muc_fasion.csv', "a", enCoding='utf-8') as file1:123             file1.writelines(124                 QQ_muc_pop + "," + str(QQ_muc_up) + "," + QQ_muc_name + "," + QQ_muc_singer + "," + QQ_muc_time + '\n')125         print('歌名:',QQ_muc_name,'\n','排名:',QQ_muc_pop,'\n','飙升指数:',QQ_muc_up,'\n','歌手名:',QQ_muc_singer,'\n','时长',QQ_muc_time)126 127 if __name__ == '__main__':128     print('-------------------start----------------------')129     print('正在爬取QQ音乐飙升榜单')130     QQ_muc_up()131     print('-------------------分界线----------------------')132     print('正在爬取QQ音乐流行榜单')133     QQ_muc_fasion()134     print('--------------------end------------------------')

数据清洗:

 

import pandas as pdimport numpy as npFasion =  pd.read_csv(r'D:\HW\QQ_muc_fasion.csv',error_bad_lines=False)Up =  pd.read_csv(r'D:\HW\QQ_muc_up.csv',error_bad_lines=False)Fasion

 

 

 

 

# 重复值处理Fasion = Fasion.drop_duplicates()Up = Up.drop_duplicates()# Nan处理Fasion = Fasion.dropna(axis = 0)Up = Up.dropna(axis = 0)# 删除无效行del Up['QQ_muc_time']del Fasion['QQ_muc_time']

 

 

import matplotlib.pyplot as plt# 可视化分析# y的点击数单位为万x = Fasion['QQ_muc_name']y = Fasion['QQ_muc_up']z = Up['QQ_muc_up']plt.rcParams['Font.sans-serif']=['SimHei'] #用来正常显示中文标签plt.rcParams['axes.unicode_minus']=Falseplt.plot(x,y,'-',color = 'r',label="热度")plt.xticks(rotation=90)plt.legend(loc = "best")#图例plt.Title("QQ音乐流行榜单趋势图")plt.xlabel("歌曲名",)#横坐标名字plt.ylabel("热度")#纵坐标名字plt.show()

 

plt.rcParams['Font.sans-serif']=['SimHei'] #用来正常显示中文标签plt.rcParams['axes.unicode_minus']=Falseplt.plot(x,z,'-',color = 'b',label="热度")plt.xticks(rotation=90)plt.legend(loc = "best")#图例plt.Title("QQ音乐飙升榜趋势图")plt.xlabel("歌曲名",)#横坐标名字plt.ylabel("热度")#纵坐标名字plt.show()

 

# 柱状图plt.bar(x,y,Alpha=0.2, wIDth=0.4, color='b', lw=3)plt.rcParams['Font.sans-serif']=['SimHei'] #用来正常显示中文标签plt.Title("QQ音乐流行榜单柱状图")plt.xticks(rotation=90)plt.xlabel("歌曲名",)#横坐标名字plt.ylabel("热度")#纵坐标名字plt.show()

 

# 柱状图plt.bar(x,z,Alpha=0.2, wIDth=0.4, color='g', lw=3)plt.rcParams['Font.sans-serif']=['SimHei'] #用来正常显示中文标签plt.Title("QQ音乐飙升榜单柱状图")plt.xticks(rotation=90)plt.xlabel("歌曲名",)#横坐标名字plt.ylabel("热度")#纵坐标名字plt.show()

 

# 水平图plt.barh(x,y, Alpha=0.2, height=0.4, color='y',label="热度指数", lw=3)plt.Title("QQ音乐流行榜单水平图")plt.legend(loc = "best")#图例plt.xlabel("热度",)#横坐标名字plt.ylabel("歌曲名")#纵坐标名字plt.show()

 

# 水平图plt.barh(x,z, Alpha=0.2, height=0.4, color='pink',label="热度指数", lw=3)plt.Title("QQ音乐飙升榜单水平图")plt.legend(loc = "best")#图例plt.xlabel("热度",)#横坐标名字plt.ylabel("歌曲名")#纵坐标名字plt.show()

 

# 散点图plt.scatter(x,y,color='pink',marker='o',s=40,edgecolor='black',Alpha=0.5)plt.xticks(rotation=90)plt.Title("QQ音乐流行榜单散点图")plt.xlabel("歌曲名",)#横坐标名字plt.ylabel("热度")#纵坐标名字plt.show()

 

# 散点图plt.scatter(x,z,color='gray',marker='o',s=40,edgecolor='black',Alpha=0.5)plt.xticks(rotation=90)plt.Title("QQ音乐飙升榜单散点图")plt.xlabel("歌曲名",)#横坐标名字plt.ylabel("热度")#纵坐标名字plt.show()

 

# 盒图plt.Boxplot(z)plt.Title("QQ音乐飙升榜单量盒图")plt.show()

 

 

# 盒图plt.Boxplot(y)plt.Title("QQ音乐流行榜单量盒图")plt.show()

 

 云词:

import pandas as pdimport numpy as npimport wordcloud as wcfrom PIL import Imageimport matplotlib.pyplot as pltbk = np.array(Image.open("QQ.jpg"))mask = bkFasion =  pd.read_csv(r'D:\HW\QQ_muc_fasion.csv',error_bad_lines=False)Up =  pd.read_csv(r'D:\HW\QQ_muc_up.csv',error_bad_lines=False)word_cloud = wc.WordCloud(                       wIDth=1000,  # 词云图宽                       height=1000,  # 词云图高                       mask = mask,                       background_color='white',  # 词云图背景颜色,默认为白色                       Font_path='msyhbd.ttc',  # 词云图 字体(中文需要设定为本机有的中文字体)                       max_Font_size=400,  # 最大字体,默认为200                       random_state=50,  # 为每个单词返回一个PIL颜色                       )text = Fasion['QQ_muc_singer']Fasion = []for i in text:    Fasion.append(i)text = " ".join(Fasion)word_cloud.generate(text)plt.imshow(word_cloud)plt.show()

 

text = Up['QQ_muc_singer']Up = []for i in text:    Up.append(i)text = " ".join(Up)word_cloud.generate(text)plt.imshow(word_cloud)plt.show()

 

 五、总结1.经过对主题数据的分析与可视化,可以得到哪些结论?是否达到预期的目标?经过此次对主题数据分析与可视化可以得出,通过歌曲热度可以了解到最新排行榜的详情。已达到预期目标2.在完成此设计过程中,得到哪些收获?以及要改进的建议?

 在设计过程中,我收获到了如何编写爬虫程序,如何把网页想要爬取的内容提取出来,云词怎么画出来。改进的地方可能是编写代码经验不足吧,写代码的时候比较吃力。希望我自己在以后的就业或者提升中花费多些时间、精力来提升这一短板。

 

总结

以上是内存溢出为你收集整理的Python——QQ音乐流行、飙升榜数据可视化分析全部内容,希望文章能够帮你解决Python——QQ音乐流行、飙升榜数据可视化分析所遇到的程序开发问题。

如果觉得内存溢出网站内容还不错,欢迎将内存溢出网站推荐给程序员好友。

欢迎分享,转载请注明来源:内存溢出

原文地址: https://outofmemory.cn/langs/1159502.html

(0)
打赏 微信扫一扫 微信扫一扫 支付宝扫一扫 支付宝扫一扫
上一篇 2022-06-01
下一篇 2022-06-01

发表评论

登录后才能评论

评论列表(0条)

保存