explain返回行是通过对应表的列的统计信息来评估的,在系统表pg_stats中。
建表,插入测试数据:
postgres=# create table tb15(ID integer,name character varying);CREATE tablepostgres=# insert into tb15 select generate_serIEs(1,10000),'john';INSERT 0 10000postgres=# analyze tb15;ANALYZE例1. where条件完全落在柱状图的边界值。
postgres=# explain(analyze,verbose,buffers)select * from tb15 where ID<1000; query PLAN ------------------------------------------------------------------------------------------------------------ Seq Scan on public.tb15 (cost=0.00..180.00 rows=1000 wIDth=9) (actual time=0.009..0.821 rows=999 loops=1) Output: ID,name Filter: (tb15.ID < 1000) Rows Removed by Filter: 9001 Buffers: shared hit=55 Total runtime: 0.901 ms(6 rows)
postgres=# \xExpanded display is on
postgres=# select * from pg_stats where tablename ='tb15' and attname='ID';-[ RECORD 1 ]----------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------schemaname | publictablename | tb15attname | IDinherited | fnull_frac | 0avg_wIDth | 4<strong>n_distinct | -1</strong>most_common_vals | most_common_freqs | <strong>histogram_bounds | {1,100,200,300,400,500,600,700,800,900,1000,1100,1200,1300,1400,1500,1600,1700,1800,1900,2000,2100,2200,2300,2400,2500,2600,2700,2800,2900,3000,3100,3200,3300,3400,3500,3600,3700,3800,3900,4000,4100,4200,4300,4400,4500,4600,4700,4800,4900,5000,5100,5200,5300,5400,5500,5600,5700,5800,5900,6000,6100,6200,6300,6400,6500,6600,6700,6800,6900,7000,7100,7200,7300,7400,7500,7600,7700,7800,7900,8000,8100,8200,8300,8400,8500,8600,8700,8800,8900,9000,9100,9200,9300,9400,9500,9600,9700,9800,9900,10000}</strong>correlation | 1most_common_elems | most_common_elem_freqs | elem_count_histogram |当where条件不是等于的时候,且n_distinct为-1的时候(表示该字段的值都是唯一的),可以参考pg_stats表中的histogram_bounds字段的统计值,也就是表字段的柱状图。
histogram_bounds 的值受参数 default_statistics_target 控制。default_statistics_target默认值为100,default_statistics_target参数值越大,那么PG搜集的列上的统计信息就越精确,当然在表做 Analyze *** 作时花费的时间也稍长些。
postgres=# show default_statistics_target ; default_statistics_target --------------------------- 100(1 row)这里提一下也可以修改字段的统计的default_statistics_target值
postgres=# \h alter table Command: ALTER tableDescription: change the deFinition of a tableSyntax:ALTER table [ IF EXISTS ] [ ONLY ] name [ * ] action [,... ]ALTER table [ IF EXISTS ] [ ONLY ] name [ * ] REname [ ColUMN ] column_name TO new_column_nameALTER table [ IF EXISTS ] [ ONLY ] name [ * ] REname CONSTRAINT constraint_name TO new_constraint_nameALTER table [ IF EXISTS ] name REname TO new_nameALTER table [ IF EXISTS ] name SET SCHEMA new_schemawhere action is one of: ADD [ ColUMN ] column_name data_type [ ColLATE collation ] [ column_constraint [ ... ] ] DROP [ ColUMN ] [ IF EXISTS ] column_name [ RESTRICT | CASCADE ] ALTER [ ColUMN ] column_name [ SET DATA ] TYPE data_type [ ColLATE collation ] [ USING Expression ] ALTER [ ColUMN ] column_name SET DEFAulT Expression ALTER [ ColUMN ] column_name DROP DEFAulT ALTER [ ColUMN ] column_name { SET | DROP } NOT NulL <span >ALTER [ ColUMN ] column_name SET STATISTICS integer --设置字段的default_statistics_target值</span> ALTER [ ColUMN ] column_name SET ( attribute_option = value [,... ] ) ALTER [ ColUMN ] column_name reset ( attribute_option [,... ] ) ALTER [ ColUMN ] column_name SET STORAGE { PLAIN | EXTERNAL | EXTENDED | MAIN } ADD table_constraint [ NOT VALID ] ... .. .分析rows=1000是怎么来的?
观察上面的柱状图,ID<1000,完全落在了前面10个区间内(1,1000),那么 rows=(10/100)*10000=1000.
10: 符合条件ID<1000的10个区间;
100:总共100个区间;
10000:10000行数据。
例2. where条件落在柱状图的非边界值。
postgres=# explain(analyze,buffers)select * from tb15 where ID<920; query PLAN ----------------------------------------------------------------------------------------------------------- Seq Scan on public.tb15 (cost=0.00..180.00 rows=920 wIDth=9) (actual time=0.013..2.352 rows=919 loops=1) Output: ID,name Filter: (tb15.ID < 920) Rows Removed by Filter: 9081 Buffers: shared hit=55 Total runtime: 2.453 ms(6 rows)现在条件是ID<920,观察上面的柱状图,ID<900的记录完整落在柱状图的前面9个区间里(1,900),ID>=900 and ID<920的落在了第10个区间,这个时候rows=920的计算方法是: rows=((9+(920-900)/(1000-900.0))/100)*10000=920.
9: 符合条件ID<900的9个区间;
(920-900):符合条件ID>=900 and ID<920的记录数;
(1000-900.0):第10个区间的总记录;
(920-900)/(1000-900.0): 符合ID>=900 and ID<920的在该区间的百分比;
100:总共100个区间;
10000:10000行数据。
先添加几个高频值:
postgres=# insert into tb15 select 1,'join' from generate_serIEs(1,500);INSERT 0 500postgres=# insert into tb15 select 2,300);INSERT 0 300postgres=# insert into tb15 select 3,200);INSERT 0 200postgres=# insert into tb15 select 4,100);INSERT 0 100postgres=# analyze tb15;ANALYZE
<pre name="code" >postgres=# select relname,relpages,reltuples from pg_class where relname='tb15'; relname | relpages | reltuples ---------+----------+----------- tb15 | 60 | 11100(1 row)
现在总共11100条记录。
看一个执行计划:postgres=# explain(analyze,buffers)select * from tb15 where ID=1; query PLAN ----------------------------------------------------------------------------------------------------------- Seq Scan on public.tb15 (cost=0.00..198.75 rows=501 wIDth=9) (actual time=0.008..0.922 rows=501 loops=1) Output: ID,name Filter: (tb15.ID = 1) Rows Removed by Filter: 10599 Buffers: shared hit=60 Total runtime: 0.962 ms(6 rows)分析rows=501是怎么来的?
首先查看该表的该字段的统计信息:
postgres=# select * from pg_stats where tablename ='tb15' and attname='ID';-[ RECORD 1 ]----------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------schemaname | publictablename | tb15attname | IDinherited | fnull_frac | 0avg_wIDth | 4n_distinct | -0.900901most_common_vals | {1,2,3,4} --高频值most_common_freqs | {0.0451351,0.0271171,0.0181081,0.0090991} --高频值对应的占比<span >histogram_bounds | {5,104,204,304,404,504,604,704,804,904,1004,1104,1204,1304,1404,1504,1604,1704,1804,1904,2004,2103,2203,2303,2403,2503,2603,2703,2803,2903,3003,3103,3203,3303,3403,3503,3603,3703,3803,3903,4003,4102,4202,4302,4402,4502,4602,4702,4802,4902,5002,5102,5202,5302,5402,5502,5602,5702,5802,5902,6002,6101,6201,6301,6401,6501,6601,6701,6801,6901,7001,7101,7201,7301,7401,7501,7601,7701,7801,7901,8001,10000}</span>correlation | 0.464526most_common_elems | most_common_elem_freqs | elem_count_histogram |可以看到柱状图是从5开始的,柱状图已经把高频值排除在外。
如果where条件是高频值里面的,就直接通过高频值对应的占比来计算返回的行数, rows = 11100*0.0451351=500.9996100=501。
第二个例子
postgres=# explain(analyze,buffers)select * from tb15 where ID<3; query PLAN ----------------------------------------------------------------------------------------------------------- Seq Scan on public.tb15 (cost=0.00..198.75 rows=803 wIDth=9) (actual time=0.010..1.174 rows=802 loops=1) Output: ID,name Filter: (tb15.ID < 3) Rows Removed by Filter: 10298 Buffers: shared hit=60 Total runtime: 1.233 ms(6 rows)
rows=11100*(0.0451351+0.0271171)=801.9994200;
postgres=# explain(analyze,buffers)select * from tb15 where ID=1 or ID>9900; query PLAN ----------------------------------------------------------------------------------------------------------- Seq Scan on public.tb15 (cost=0.00..226.50 rows=596 wIDth=9) (actual time=0.009..1.673 rows=601 loops=1) Output: ID,name Filter: ((tb15.ID = 1) OR (tb15.ID > 9900)) Rows Removed by Filter: 10499 Buffers: shared hit=60 Total runtime: 1.726 ms(6 rows)rows = 11100*(0.0451351)+1/100.0*(1-0.0451351-0.0271171-0.0181081-0.0090991)*11100=600.9596166
例3:多个列查询条件的选择性相乘评估例子
postgres=# explain select * from tb15 where ID=1 and name='john'; query PLAN -------------------------------------------------------- Seq Scan on tb15 (cost=0.00..226.50 rows=451 wIDth=9) Filter: ((ID = 1) AND ((name)::text = 'john'::text))(2 rows
postgres=# select * from pg_stats where tablename ='tb15' and attname='ID';--ID字段的统计信息-[ RECORD 1 ]----------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------schemaname | publictablename | tb15attname | IDinherited | fnull_frac | 0avg_wIDth | 4n_distinct | -0.900901<span >most_common_vals | {1,4}most_common_freqs | {0.0451351,0.0090991}</span>histogram_bounds | {5,10000}correlation | 0.464526most_common_elems | most_common_elem_freqs | elem_count_histogram |
postgres=# select * from pg_stats where tablename ='tb15' and attname='name'; --name字段的统计信息-[ RECORD 1 ]----------+---------------------schemaname | publictablename | tb15attname | nameinherited | fnull_frac | 0avg_wIDth | 5n_distinct | 2most_common_vals | {john,join}most_common_freqs | {0.900901,0.0990991}histogram_bounds | correlation | 1most_common_elems | most_common_elem_freqs | elem_count_histogram |
rows = 11100*(0.0451351*0.900901)=451.3510496486100。
总结以上是内存溢出为你收集整理的PostgreSQL explain返回行的评估全部内容,希望文章能够帮你解决PostgreSQL explain返回行的评估所遇到的程序开发问题。
如果觉得内存溢出网站内容还不错,欢迎将内存溢出网站推荐给程序员好友。
欢迎分享,转载请注明来源:内存溢出
评论列表(0条)