PostgreSQL explain返回行的评估

PostgreSQL explain返回行的评估,第1张

概述explain返回行是通过对应表的列的统计信息来评估的,在系统表pg_stats中。 建表,插入测试数据: postgres=# create table tb15(id integer,name character varying);CREATE TABLEpostgres=# insert into tb15 select generate_series(1,10000),'john';

explain返回行是通过对应表的列的统计信息来评估的,在系统表pg_stats中。

建表,插入测试数据:

postgres=# create table tb15(ID integer,name character varying);CREATE tablepostgres=# insert into tb15 select generate_serIEs(1,10000),'john';INSERT 0 10000postgres=# analyze tb15;ANALYZE
例1. where条件完全落在柱状图的边界值。
postgres=# explain(analyze,verbose,buffers)select * from tb15 where ID<1000;                                                 query PLAN                                                 ------------------------------------------------------------------------------------------------------------ Seq Scan on public.tb15  (cost=0.00..180.00 rows=1000 wIDth=9) (actual time=0.009..0.821 rows=999 loops=1)   Output: ID,name   Filter: (tb15.ID < 1000)   Rows Removed by Filter: 9001   Buffers: shared hit=55 Total runtime: 0.901 ms(6 rows)
postgres=# \xExpanded display is on
postgres=# select * from pg_stats where tablename ='tb15' and attname='ID';-[ RECORD 1 ]----------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------schemaname             | publictablename              | tb15attname                | IDinherited              | fnull_frac              | 0avg_wIDth              | 4<strong>n_distinct             | -1</strong>most_common_vals       | most_common_freqs      | <strong>histogram_bounds       | {1,100,200,300,400,500,600,700,800,900,1000,1100,1200,1300,1400,1500,1600,1700,1800,1900,2000,2100,2200,2300,2400,2500,2600,2700,2800,2900,3000,3100,3200,3300,3400,3500,3600,3700,3800,3900,4000,4100,4200,4300,4400,4500,4600,4700,4800,4900,5000,5100,5200,5300,5400,5500,5600,5700,5800,5900,6000,6100,6200,6300,6400,6500,6600,6700,6800,6900,7000,7100,7200,7300,7400,7500,7600,7700,7800,7900,8000,8100,8200,8300,8400,8500,8600,8700,8800,8900,9000,9100,9200,9300,9400,9500,9600,9700,9800,9900,10000}</strong>correlation            | 1most_common_elems      | most_common_elem_freqs | elem_count_histogram   | 
当where条件不是等于的时候,且n_distinct为-1的时候(表示该字段的值都是唯一的),可以参考pg_stats表中的histogram_bounds字段的统计值,也就是表字段的柱状图。
histogram_bounds 的值受参数 default_statistics_target 控制。default_statistics_target默认值为100,default_statistics_target参数值越大,那么PG搜集的列上的统计信息就越精确,当然在表做 Analyze *** 作时花费的时间也稍长些。
postgres=# show default_statistics_target ; default_statistics_target --------------------------- 100(1 row)
这里提一下也可以修改字段的统计的default_statistics_target值
postgres=# \h alter table Command:     ALTER tableDescription: change the deFinition of a tableSyntax:ALTER table [ IF EXISTS ] [ ONLY ] name [ * ]    action [,... ]ALTER table [ IF EXISTS ] [ ONLY ] name [ * ]    REname [ ColUMN ] column_name TO new_column_nameALTER table [ IF EXISTS ] [ ONLY ] name [ * ]    REname CONSTRAINT constraint_name TO new_constraint_nameALTER table [ IF EXISTS ] name    REname TO new_nameALTER table [ IF EXISTS ] name    SET SCHEMA new_schemawhere action is one of:    ADD [ ColUMN ] column_name data_type [ ColLATE collation ] [ column_constraint [ ... ] ]    DROP [ ColUMN ] [ IF EXISTS ] column_name [ RESTRICT | CASCADE ]    ALTER [ ColUMN ] column_name [ SET DATA ] TYPE data_type [ ColLATE collation ] [ USING Expression ]    ALTER [ ColUMN ] column_name SET DEFAulT Expression    ALTER [ ColUMN ] column_name DROP DEFAulT    ALTER [ ColUMN ] column_name { SET | DROP } NOT NulL    <span >ALTER [ ColUMN ] column_name SET STATISTICS integer --设置字段的default_statistics_target值</span>    ALTER [ ColUMN ] column_name SET ( attribute_option = value [,... ] )    ALTER [ ColUMN ] column_name reset ( attribute_option [,... ] )    ALTER [ ColUMN ] column_name SET STORAGE { PLAIN | EXTERNAL | EXTENDED | MAIN }    ADD table_constraint [ NOT VALID ]    ...    ..    .
分析rows=1000是怎么来的?
观察上面的柱状图,ID<1000,完全落在了前面10个区间内(1,1000),那么 rows=(10/100)*10000=1000.
10: 符合条件ID<1000的10个区间;
100:总共100个区间;
10000:10000行数据。

例2. where条件落在柱状图的非边界值。
postgres=# explain(analyze,buffers)select * from tb15 where ID<920;                                                query PLAN                                                 ----------------------------------------------------------------------------------------------------------- Seq Scan on public.tb15  (cost=0.00..180.00 rows=920 wIDth=9) (actual time=0.013..2.352 rows=919 loops=1)   Output: ID,name   Filter: (tb15.ID < 920)   Rows Removed by Filter: 9081   Buffers: shared hit=55 Total runtime: 2.453 ms(6 rows)
现在条件是ID<920,观察上面的柱状图,ID<900的记录完整落在柱状图的前面9个区间里(1,900),ID>=900 and ID<920的落在了第10个区间,这个时候rows=920的计算方法是: rows=((9+(920-900)/(1000-900.0))/100)*10000=920.
9: 符合条件ID<900的9个区间;
(920-900):符合条件ID>=900 and ID<920的记录数;
(1000-900.0):第10个区间的总记录;
(920-900)/(1000-900.0): 符合ID>=900 and ID<920的在该区间的百分比;
100:总共100个区间;

10000:10000行数据。


例3:有高频值
先添加几个高频值:
postgres=# insert into tb15 select 1,'join' from generate_serIEs(1,500);INSERT 0 500postgres=# insert into tb15 select 2,300);INSERT 0 300postgres=# insert into tb15 select 3,200);INSERT 0 200postgres=# insert into tb15 select 4,100);INSERT 0 100postgres=# analyze tb15;ANALYZE
<pre name="code" >postgres=# select relname,relpages,reltuples from pg_class  where relname='tb15'; relname | relpages | reltuples ---------+----------+----------- tb15    |       60 |     11100(1 row)
 

现在总共11100条记录。

看一个执行计划:
postgres=# explain(analyze,buffers)select * from tb15 where ID=1;                                                query PLAN                                                 ----------------------------------------------------------------------------------------------------------- Seq Scan on public.tb15  (cost=0.00..198.75 rows=501 wIDth=9) (actual time=0.008..0.922 rows=501 loops=1)   Output: ID,name   Filter: (tb15.ID = 1)   Rows Removed by Filter: 10599   Buffers: shared hit=60 Total runtime: 0.962 ms(6 rows)
分析rows=501是怎么来的?
首先查看该表的该字段的统计信息:
postgres=# select * from pg_stats where tablename ='tb15' and attname='ID';-[ RECORD 1 ]----------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------schemaname             | publictablename              | tb15attname                | IDinherited              | fnull_frac              | 0avg_wIDth              | 4n_distinct             | -0.900901most_common_vals       | {1,2,3,4} --高频值most_common_freqs      | {0.0451351,0.0271171,0.0181081,0.0090991} --高频值对应的占比<span >histogram_bounds       | {5,104,204,304,404,504,604,704,804,904,1004,1104,1204,1304,1404,1504,1604,1704,1804,1904,2004,2103,2203,2303,2403,2503,2603,2703,2803,2903,3003,3103,3203,3303,3403,3503,3603,3703,3803,3903,4003,4102,4202,4302,4402,4502,4602,4702,4802,4902,5002,5102,5202,5302,5402,5502,5602,5702,5802,5902,6002,6101,6201,6301,6401,6501,6601,6701,6801,6901,7001,7101,7201,7301,7401,7501,7601,7701,7801,7901,8001,10000}</span>correlation            | 0.464526most_common_elems      | most_common_elem_freqs | elem_count_histogram   | 
可以看到柱状图是从5开始的,柱状图已经把高频值排除在外。
如果where条件是高频值里面的,就直接通过高频值对应的占比来计算返回的行数, rows = 11100*0.0451351=500.9996100=501。

第二个例子
postgres=# explain(analyze,buffers)select * from tb15 where ID<3;                                                query PLAN                                                 ----------------------------------------------------------------------------------------------------------- Seq Scan on public.tb15  (cost=0.00..198.75 rows=803 wIDth=9) (actual time=0.010..1.174 rows=802 loops=1)   Output: ID,name   Filter: (tb15.ID < 3)   Rows Removed by Filter: 10298   Buffers: shared hit=60 Total runtime: 1.233 ms(6 rows)

rows=11100*(0.0451351+0.0271171)=801.9994200;


第三个例子:条件中部分值在高频值内
postgres=# explain(analyze,buffers)select * from tb15 where ID=1 or ID>9900;                                                query PLAN                                                 ----------------------------------------------------------------------------------------------------------- Seq Scan on public.tb15  (cost=0.00..226.50 rows=596 wIDth=9) (actual time=0.009..1.673 rows=601 loops=1)   Output: ID,name   Filter: ((tb15.ID = 1) OR (tb15.ID > 9900))   Rows Removed by Filter: 10499   Buffers: shared hit=60 Total runtime: 1.726 ms(6 rows)
rows = 11100*(0.0451351)+1/100.0*(1-0.0451351-0.0271171-0.0181081-0.0090991)*11100=600.9596166


例3:多个列查询条件的选择性相乘评估例子
postgres=# explain select * from tb15 where ID=1 and name='john';                       query PLAN                       -------------------------------------------------------- Seq Scan on tb15  (cost=0.00..226.50 rows=451 wIDth=9)   Filter: ((ID = 1) AND ((name)::text = 'john'::text))(2 rows
postgres=# select * from pg_stats where tablename ='tb15' and attname='ID';--ID字段的统计信息-[ RECORD 1 ]----------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------schemaname             | publictablename              | tb15attname                | IDinherited              | fnull_frac              | 0avg_wIDth              | 4n_distinct             | -0.900901<span >most_common_vals       | {1,4}most_common_freqs      | {0.0451351,0.0090991}</span>histogram_bounds       | {5,10000}correlation            | 0.464526most_common_elems      | most_common_elem_freqs | elem_count_histogram   |
postgres=# select * from pg_stats where tablename ='tb15' and attname='name'; --name字段的统计信息-[ RECORD 1 ]----------+---------------------schemaname             | publictablename              | tb15attname                | nameinherited              | fnull_frac              | 0avg_wIDth              | 5n_distinct             | 2most_common_vals       | {john,join}most_common_freqs      | {0.900901,0.0990991}histogram_bounds       | correlation            | 1most_common_elems      | most_common_elem_freqs | elem_count_histogram   |

rows = 11100*(0.0451351*0.900901)=451.3510496486100。

总结

以上是内存溢出为你收集整理的PostgreSQL explain返回行的评估全部内容,希望文章能够帮你解决PostgreSQL explain返回行的评估所遇到的程序开发问题。

如果觉得内存溢出网站内容还不错,欢迎将内存溢出网站推荐给程序员好友。

欢迎分享,转载请注明来源:内存溢出

原文地址: http://outofmemory.cn/sjk/1176135.html

(0)
打赏 微信扫一扫 微信扫一扫 支付宝扫一扫 支付宝扫一扫
上一篇 2022-06-02
下一篇 2022-06-02

发表评论

登录后才能评论

评论列表(0条)

保存