允许在HAVINg子句中使用别名的性能影响

允许在HAVINg子句中使用别名的性能影响,第1张

允许在HAVINg子句中使用别名的性能影响

仅仅专注于特定查询,并在下面加载示例数据。这确实解决了其他一些查询,例如

count(distinct ...)
其他人提到的查询。

alias in the HAVINg
出现要么略微优于或相当多的优于其替代(取决于查询)。

这使用一个预先存在的表,其中大约有500万行是通过我的回答快速创建的,耗时3至5分钟。

结果结构:

CREATE TABLE `ratings` (  `id` int(11) NOT NULL AUTO_INCREMENT,  `thing` int(11) NOT NULL,  PRIMARY KEY (`id`)) ENGINE=InnoDB AUTO_INCREMENT=5046214 DEFAULT CHARSET=utf8;

但是使用INNODB代替。由于范围保留插入而创建预期的INNODB间隙异常。只是说而已,没有区别。470万行。

修改表以接近Tim的假定架构。

rename table ratings to students; -- not exactly instanteous (a COPY)alter table students add column camId int; -- get it near Tim's schema-- don't add the `camId` index yet

以下将花费一些时间。一次又一次地大块运行它,否则您的连接可能会超时。超时是由于update语句中有500万行没有LIMIT子句。注意,我们 确实
有一个LIMIT子句。

因此,我们将进行半百万次的行迭代。将列设置为1到20之间的随机数

update students set camId=floor(rand()*20+1) where camId is null limit 500000; -- well that took a while (no surprise)

继续运行以上内容,直到no

camId
为null。

我跑了10次(整个过程需要7到10分钟)

select camId,count(*) from studentsgroup by camId order by 1 ;1   2356412   2360603   2362494   2357365   2363336   2355407   2358708   2368159   23595010  23559411  23650412  23648313  23565614  23626415  23605016  23617617  23609718  23523919  23555620  234779select count(*) from students;-- 4.7 Million rows

创建一个有用的索引(当然是在插入之后)。

create index `ix_stu_cam` on students(camId); -- takes 45 secondsANALYZE TABLE students; -- update the stats: http://dev.mysql.com/doc/refman/5.7/en/analyze-table.html-- the above is fine, takes 1 second

创建校园表。

create table campus(   camID int auto_increment primary key,    camName varchar(100) not null);insert campus(camName) values('one'),('2'),('3'),('4'),('5'),('6'),('7'),('8'),('9'),('ten'),('etc'),('etc'),('etc'),('etc'),('etc'),('etc'),('etc'),('etc'),('etc'),('twenty');-- ok 20 of them

运行两个查询:

SELECT students.camID, campus.camName, COUNT(students.id) as studentCount FROM students JOIN campus     ON campus.camID = students.camID GROUP BY students.camID, campus.camName HAVINg COUNT(students.id) > 3 ORDER BY studentCount; -- run it many many times, back to back, 5.50 seconds, 20 rows of output

SELECt students.camID, campus.camName, COUNT(students.id) as studentCount FROM students JOIN campus     ON campus.camID = students.camID GROUP BY students.camID, campus.camName HAVINg studentCount > 3 ORDER BY studentCount; -- run it many many times, back to back, 5.50 seconds, 20 rows of output

所以时代是一样的。每跑十遍。

两者的

EXPLAIN
输出相同

+----+-------------+----------+------+---------------+------------+---------+----------------------+--------+---------------------------------+| id | select_type | table    | type | possible_keys | key        | key_len | ref       | rows   | Extra     |+----+-------------+----------+------+---------------+------------+---------+----------------------+--------+---------------------------------+|  1 | SIMPLE      | campus   | ALL  | PRIMARY       | NULL       | NULL    | NULL      |     20 | Using temporary; Using filesort ||  1 | SIMPLE      | students | ref  | ix_stu_cam    | ix_stu_cam | 5       | bigtest.campus.camID | 123766 | Using index          |+----+-------------+----------+------+---------------+------------+---------+----------------------+--------+---------------------------------+

使用AVG()函数,通过以下两个查询

having
(具有相同的
EXPLAIN
输出)中的别名,我的性能提高了约12%。

SELECT students.camID, campus.camName, avg(students.id) as studentAvg FROM students JOIN campus     ON campus.camID = students.camID GROUP BY students.camID, campus.camName HAVINg avg(students.id) > 2200000 ORDER BY students.camID; -- avg time 7.5explainSELECt students.camID, campus.camName, avg(students.id) as studentAvg FROM students JOIN campus     ON campus.camID = students.camID GROUP BY students.camID, campus.camName HAVINg studentAvg > 2200000ORDER BY students.camID;-- avg time 6.5

最后是

DISTINCT

SELECt students.camID, count(distinct students.id) as studentDistinct FROM students JOIN campus     ON campus.camID = students.camID GROUP BY students.camID HAVINg count(distinct students.id) > 1000000 ORDER BY students.camID; -- 10.6   10.84   12.1   11.49   10.1   9.97   10.27   11.53   9.84 9.98-- 9.9 SELECt students.camID, count(distinct students.id) as studentDistinct  FROM students  JOIN campus     ON campus.camID = students.camID  GROUP BY students.camID  HAVINg studentDistinct > 1000000  ORDER BY students.camID; -- 6.81    6.55   6.75   6.31   7.11 6.36   6.55-- 6.45

具有相同输出的别名持续运行
速度提高35%

EXPLAIN
。见下文。因此,相同的Explain输出已显示两次,但并不能得出相同的性能,但这只是一般的提示。

+----+-------------+----------+-------+---------------+------------+---------+----------------------+--------+----------------------------------------------+| id | select_type | table    | type  | possible_keys | key        | key_len | ref       | rows   | Extra       |+----+-------------+----------+-------+---------------+------------+---------+----------------------+--------+----------------------------------------------+|  1 | SIMPLE      | campus   | index | PRIMARY       | PRIMARY    | 4       | NULL      |     20 | Using index; Using temporary; Using filesort ||  1 | SIMPLE      | students | ref   | ix_stu_cam    | ix_stu_cam | 5       | bigtest.campus.camID | 123766 | Using index |+----+-------------+----------+-------+---------------+------------+---------+----------------------+--------+----------------------------------------------+

目前,Optimizer似乎更喜欢别名,尤其是对于

DISTINCT.



欢迎分享,转载请注明来源:内存溢出

原文地址: https://outofmemory.cn/zaji/5063230.html

(0)
打赏 微信扫一扫 微信扫一扫 支付宝扫一扫 支付宝扫一扫
上一篇 2022-11-16
下一篇 2022-11-16

发表评论

登录后才能评论

评论列表(0条)

保存