PostgreSQL *** 作符与优化器详解

PostgreSQL *** 作符与优化器详解,第1张

概述PostgreSQL 支持自定义 *** 作符,本质上是调用函数来实现的。 语法如下: 例如创建一个求两个值的平均值的 *** 作符: 首选要创建函数 postgres = # create function f_avg(numeric,numeric) returns numeric as $$ postgres$#   select ($1+$2)/2; postgres$# $$ language sql

Postgresql 支持自定义 *** 作符,本质上是调用函数来实现的。

语法如下:

例如创建一个求两个值的平均值的 *** 作符:

首选要创建函数

postgres=# create function f_avg(numeric,numeric) returns numeric as $$

postgres$# select ($1+$2)/2;

postgres$# $$ language sql strict;

CREATE FUNCTION

验证函数

postgres=# select f_avg(1,null);

f_avg

-------

(1 row)

postgres=# select f_avg(1,2);

f_avg

--------------------

1.5000000000000000

(1 row)

创建 *** 作符,指定左右参数类型,调用的函数名,commutator是一个和优化器相关的选项,我后面会重点介绍:

postgres=# create operator ## (procedure=f_avg,leftarg=numeric,rightarg=numeric,commutator='##');

CREATE OPERATOR

postgres=# select 1 ## 2;

?column?

--------------------

1.5000000000000000

(1 row)

注意到在创建 *** 作符的语法中有6个和优化器有关的关键字:

[,COMMUTATOR=com_op][,NEGATOR=neg_op]

[,RESTRICT=res_proc][,JOIN=join_proc]

[,HASHES][,MERGES]

介绍如下:

假设x表示 *** 作符左侧的参数,y表示 *** 作符右侧的参数

1. commutator,指明x op1 y等效于y op2 x,即 *** 作数调换,返回的值一样。例如2>1 和1<2结果是一致的。那么>就是<的commutator或者反之。又例如1+2和2+1是等价的,那么+就是+的commutator。commutator只需要在创建其中一个 *** 作符时指定,创建另一个对应的 *** 作符时可以不需要指定,Postgresql会自动建立这个关系。例如创建> *** 作符时指定了它的commutator是<,那么在创建< *** 作符时可以不需要指定>是它的commutator。

另外需要注意,有commutator *** 作符的 *** 作符的左右两侧的参数类型必须一致,这样才能满足x op1 y等价于y op2 x。

优化器如何利用commutator呢?例如索引扫描,必须列在 *** 作符的左侧才能使用索引。1 > tbl.c这个条件,如果>没有commutator的话,是不能使用索引的。

例子,以int4的>和< *** 作符为例,实验一下:

>和<在Postgresql中是一对commutator

postgres=# select oprcom::regoper from pg_operator where oprname='>' and oprcode='int4gt'::regproc;

oprcom

--------------

pg_catalog.<

(1 row)

postgres=# select oprcom::regoper from pg_operator where oprname='<' and oprcode='int4lt'::regproc;

oprcom

--------------

pg_catalog.>

(1 row)

记录他们的oprcom对应的OID

postgres=# select * from pg_operator where oprname='>' and oprcode='int4gt'::regproc;

oprname | oprnamespace | oprowner | oprkind | oprcanmerge | oprcanhash | oprleft | oprright | oprresult | oprcom | oprnegate | oprc

ode | oprrest | oprjoin

---------+--------------+----------+---------+-------------+------------+---------+----------+-----------+--------+-----------+-----

----+-------------+-----------------

> | 11 | 10 | b | f | f | 23 | 23 | 16 | 97 | 523 | int4

gt | scalargtsel | scalargtjoinsel

(1 row)

postgres=# select * from pg_operator where oprname='<' and oprcode='int4lt'::regproc;

oprname | oprnamespace | oprowner | oprkind | oprcanmerge | oprcanhash | oprleft | oprright | oprresult | oprcom | oprnegate | oprc

ode | oprrest | oprjoin

---------+--------------+----------+---------+-------------+------------+---------+----------+-----------+--------+-----------+-----

----+-------------+-----------------

< | 11 | 10 | b | f | f | 23 | 23 | 16 | 521 | 525 | int4

lt | scalarltsel | scalarltjoinsel

(1 row)

接下来我要通过更新pg_operator解除他们的commutator关系,设置为0即可。

postgres=# update pg_operator set oprcom=0 where oprname='>' and oprcode='int4gt'::regproc;

UPDATE 1

postgres=# update pg_operator set oprcom=0 where oprname='<' and oprcode='int4lt'::regproc;

UPDATE 1

创建测试表,插入测试数据,创建索引:

postgres=# create table tbl(ID int);

CREATE table

postgres=# insert into tbl select generate_serIEs(1,100000);

INSERT 0 100000

postgres=# create index IDx_tbl_ID on tbl(ID);

CREATE INDEX

将列放在条件的左边可以走索引,但是放在右边不走索引。因为优化器不能决定>,<是否为commutator

postgres=# explain select * from tbl where ID<10;

query PLAN

---------------------------------------------------------------------------

Index Only Scan using IDx_tbl_ID on tbl (cost=0.29..8.45 rows=9 wIDth=4)

Index Cond: (ID < 10)

(2 rows)

postgres=# explain select * from tbl where 10>ID;

query PLAN

----------------------------------------------------------

Seq Scan on tbl (cost=0.00..1361.00 rows=33333 wIDth=4)

Filter: (10 > ID)

(2 rows)

重新建立这两个 operator的commutator关系后,优化器会自动将10>ID转换为ID<10,并且走索引了:

postgres=# update pg_operator set oprcom=521 where oprname='<' and oprcode='int4lt'::regproc;

UPDATE 1

postgres=# update pg_operator set oprcom=97 where oprname='>' and oprcode='int4gt'::regproc;

UPDATE 1

postgres=# explain select * from tbl where 10>ID;

query PLAN

---------------------------------------------------------------------------

Index Only Scan using IDx_tbl_ID on tbl (cost=0.29..8.45 rows=9 wIDth=4)

Index Cond: (ID < 10)

(2 rows)

2. negator,指x op1 y 等价于 not(y op2 x),或者x op1等价于not( y op2),或者op1 x 等价于not(op2 y),因此negator支持一元和二元 *** 作符。

例子:

如果=和<>是一对negator *** 作符,NOT (x = y) 可以简化为 x <> y。

postgres=# explain select * from tbl where 10=ID;

query PLAN

---------------------------------------------------------------------------

Index Only Scan using IDx_tbl_ID on tbl (cost=0.29..8.31 rows=1 wIDth=4)

Index Cond: (ID = 10)

(2 rows)

postgres=# explain select * from tbl where not(10<>ID);

query PLAN

---------------------------------------------------------------------------

Index Only Scan using IDx_tbl_ID on tbl (cost=0.29..8.31 rows=1 wIDth=4)

Index Cond: (ID = 10)

(2 rows)

同样, *** 作符两侧参数x,y的类型必须一致。并且仅适用于返回布尔逻辑类型的 *** 作符。

3. restrict,是用于评估选择性的函数,仅适用于二元 *** 作符,例如where col>100,这个查询条件,如何评估选择性呢?是通过 *** 作符的restrict来指定的,选择性乘以pg_class.reltuples就可以评估得到这个查询条件的行数。

选择性函数的代码在 src/backend/utils/adt/

包括

-rw-r--r--.11107110733191Jun1003:29array_selfuncs.c

-rw-r--r--.1110711072316Jun1003:29geo_selfuncs.c

-rw-r--r--.111071107 720Jun1003:29network_selfuncs.c

-rw-r--r--.11107110733895Jun1003:29rangetypes_selfuncs.c

-rw-r--r--.111071107218809Jun1003:29selfuncs.c

选择性函数,还需要依赖数据库的统计信息,从而计算选择性,常见的选择性计算函数有:

postgres=# select distinct oprrest from pg_operator order by 1;

oprrest

--------------

-

eqsel 相等

neqsel 不相等

scalarltsel 小于等于

scalargtsel 大于等于

areasel

positionsel

contsel

iclikesel

icnlikesel

regexeqsel

likesel

icregexeqsel

regexnesel

nlikesel

icregexnesel

rangesel

networksel

tsmatchsel

arraycontsel

(20 rows)

当然,用户如果自定义数据类型的话,也可以自定义选择性函数,或者使用以上标准的选择性函数,只是可能需要实现一下类型转换。

源码中的介绍:

src/backend/utils/adt/selfuncs.c

/*----------

* Operator selectivity estimation functions are called to estimate the

* selectivity of WHERE clauses whose top-level operator is their operator.

* We divIDe the problem into two cases:

* Restriction clause estimation: the clause involves vars of just

* one relation. 一种是符合WHERE条件的选择性(百分比)。

* Join clause estimation: the clause involves vars of multiple rels.

* Join selectivity estimation is far more difficult and usually less accurate

* than restriction estimation. -- JOIN的选择性评估通常没有WHERE条件的选择性准确。

*

* When dealing with the inner scan of a nestloop join,we consIDer the

* join's joinclauses as restriction clauses for the inner relation,and

* treat vars of the outer relation as parameters (a/k/a constants of unkNown

* values). So,restriction estimators need to be able to accept an argument

* telling which relation is to be treated as the variable.

在使用nestloop JOIN时,一个表的字段将作为变量,另一个表的字段(及其统计信息)与 *** 作符作为JOIN评估子句。

*

* The call convention for a restriction estimator (oprrest function) is

*

* Selectivity oprrest (PlannerInfo *root,

* OID operator,

* List *args,

* int varRelID);

* 评估选择性需要4个参数:

* root: general information about the query (rtable and RelOptInfo Lists

* are particularly important for the estimator). plannerinfo信息。

* operator: OID of the specific operator in question. *** 作符的OID

* args: argument List from the operator clause. *** 作符子句中的参数列表

* varRelID: if not zero,the relID (rtable index) of the relation to

* be treated as the variable relation. May be zero if the args List

* is kNown to contain vars of only one relation. 表示where条件所包含的参数来自哪些relation。

*

* This is represented at the sql level (in pg_proc) as

*

* float8 oprrest (internal,oID,internal,int4); 在pg_proc数据字典中表示为oprrest指定的函数。

*

* The result is a selectivity,that is,a fraction (0 to 1) of the rows

* of the relation that are expected to produce a TRUE result for the

* given operator. 选择性函数的评估结果就是一个百分比。乘以pg_class.reltuples就可以得到记录数。

*

* The call convention for a join estimator (oprjoin function) is similar

* except that varRelID is not needed,and instead join information is

* supplIEd:

* JOIN选择性的计算函数与WHERE选择性的计算函数参数有轻微差别,么有varRelID,增加了join信息的参数。

* Selectivity oprjoin (PlannerInfo *root,

* JoinType jointype,

* SpecialJoinInfo *sjinfo);

*

* float8 oprjoin (internal,int2,internal);

*

* (Before Postgres 8.4,join estimators had only the first four of these

* parameters. That signature is still allowed,but deprecated.) The

* relationship between jointype and sjinfo is explained in the comments for

* clause_selectivity() --- the short version is that jointype is usually

* best ignored in favor of examining sjinfo.

*

* Join selectivity for regular inner and outer joins is defined as the

* fraction (0 to 1) of the cross product of the relations that is expected

* to produce a TRUE result for the given operator. For both semi and anti (半连接与预连接)

* joins,however,the selectivity is defined as the fraction of the left-hand

* sIDe relation's rows that are expected to have a match (IE,at least one

* row with a TRUE result) in the right-hand sIDe.

*

* For both oprrest and oprjoin functions,the operator's input collation OID

* (if any) is passed using the standard fmgr mechanism,so that the estimator

* function can fetch it with PG_GET_ColLATION(). Note,that all

* statistics in pg_statistic are currently built using the database's default

* collation. Thus,in most cases where we are looking at statistics,we

* should ignore the actual operator collation and use DEFAulT_ColLATION_OID.

* We expect that the error induced by doing this is usually not large enough

* to justify complicating matters.

*----------

4. join,是joinsel即join的选择性计算函数。

对应pg_operator.oprjoin

postgres=# select distinct oprjoin from pg_operator order by 1;

oprjoin

------------------

-

eqjoinsel

neqjoinsel

scalarltjoinsel

scalargtjoinsel

areajoinsel

positionjoinsel

contjoinsel

iclikejoinsel

icnlikejoinsel

regexeqjoinsel

likejoinsel

icregexeqjoinsel

regexnejoinsel

nlikejoinsel

icregexnejoinsel

networkjoinsel

tsmatchjoinsel

arraycontjoinsel

(19 rows)

5. hashes

6. merges

hashes和merges表示该 *** 作符是否允许hash join和merge join,只有返回布尔逻辑值的二元 *** 作符满足这个要求。

我们在pg_operator这个catalog中也可以查看到对应的介绍:

name Type References Description
oID oID Row IDentifIEr (hIDden attribute; must be explicitly selected)
oprname name name of the operator
oprnamespace oID pg_namespace.oID The OID of the namespace that contains this operator
oprowner oID pg_authID.oID Owner of the operator
oprkind char b= infix ("between"),l= prefix ("left"),r= postfix ("right")
指定 *** 作符在什么位置,例如中间,左侧,右侧
oprcanmerge bool This operator supports merge joins此 *** 作符是否支持merge join
oprcanhash bool This operator supports hash joins此 *** 作符是否支持hash join
oprleft oID pg_type.oID Type of the left operand *** 作符左侧的数据类型
oprright oID pg_type.oID Type of the right operand *** 作符右侧的数据类型
oprresult oID pg_type.oID Type of the result返回结果的数据类型
oprcom oID pg_operator.oID Commutator of this operator,if any
oprnegate oID pg_operator.oID Negator of this operator,if any
oprcode regproc pg_proc.oID Function that implements this operator
oprrest regproc pg_proc.oID Restriction selectivity estimation function for this operator
oprjoin regproc pg_proc.oID Join selectivity estimation function for this operator
总结

以上是内存溢出为你收集整理的PostgreSQL *** 作符与优化器详解全部内容,希望文章能够帮你解决PostgreSQL *** 作符与优化器详解所遇到的程序开发问题。

如果觉得内存溢出网站内容还不错,欢迎将内存溢出网站推荐给程序员好友。

欢迎分享,转载请注明来源:内存溢出

原文地址: https://outofmemory.cn/sjk/1174309.html

(0)
打赏 微信扫一扫 微信扫一扫 支付宝扫一扫 支付宝扫一扫
上一篇 2022-06-02
下一篇 2022-06-02

发表评论

登录后才能评论

评论列表(0条)

保存