PostgreSQL不使用部分索引_sql

概述我在PostgreSQL 9.2中有一个包含文本列的表.我们称之为text_col.此列中的值非常独特(最多可包含5-6个重复项).该表有大约500万行.大约一半的这些行包含text_col的空值.当我执行以下查询时,我期望1-5行.在大多数情况下(> 80％)我只期望1行. 询问 explain analyze SELECT col1,col2.. colNFROM table WHERE 我在Postgresql 9.2中有一个包含文本列的表.我们称之为text_col.此列中的值非常独特(最多可包含5-6个重复项).该表有大约500万行.大约一半的这些行包含text_col的空值.当我执行以下查询时,我期望1-5行.在大多数情况下(> 80％)我只期望1行.

询问

explain analyze SELECT col1,col2.. colNFROM table WHERE text_col = 'my_value';

text_col上存在btree索引.查询规划器从不使用此索引,我不确定原因.这是查询的输出.

规划人员

Seq Scan on two (cost=0.000..459573.080 rows=93 wIDth=339) (actual time=1392.864..3196.283 rows=2 loops=1)Filter: (victor = 'foxtrot'::text)Rows Removed by Filter: 4077384

我添加了另一个部分索引来尝试过滤掉那些非空的值,但这没有帮助(有或没有text_pattern_ops.我不需要text_pattern_ops,因为我的查询中没有表达liKE条件,但它们也匹配相等).

CREATE INDEX name_IDx  ON table  USING btree  (text_col ColLATE pg_catalog."default" text_pattern_ops)  WHERE text_col IS NOT NulL;

使用set enable_seqscan = off禁用序列扫描;使规划者仍然通过index_scan选择seqscan.综上所述…

>此查询返回的行数很小.
>鉴于非空行非常独特,对文本的索引扫描应该更快.
>清理和分析表并没有帮助优化器选择索引.

我的问题

>为什么数据库通过索引扫描选择序列扫描？
>当一个表有一个应该检查相等条件的文本列时,是否有任何我可以遵循的最佳实践？
>如何减少此查询所需的时间？

[编辑 – 更多信息]

>索引扫描在我的本地数据库中获取,该数据库包含大约10％的生产数据.

一个 partial index是一个好主意,可以排除你显然不需要的表的一半行.更简单：

CREATE INDEX name_IDx ON table (text_col)WHERE text_col IS NOT NulL;

确保在创建索引后运行ANALYZE表. (如果您不手动执行,autovacuum会在一段时间后自动执行此 *** 作,但如果您在创建后立即进行测试,则测试将失败.)

然后,为了说服查询规划器可以使用特定的部分索引,在查询中重复WHERE条件 – 即使它看起来完全是多余的：

SELECT col1,col2,.. colNFROM   table WHERE  text_col = 'my_value'AND text_col IS NOT NulL;  -- repeat condition

瞧.

Per documentation：

However,keep in mind that the predicate must match the conditions
used in the querIEs that are supposed to benefit from the index. To be
precise,a partial index can be used in a query only if the system can
recognize that the WHERE condition of the query mathematically implIEs
the predicate of the index. Postgresql does not have a sophisticated
theorem prover that can recognize mathematically equivalent
Expressions that are written in different forms. (Not only is such a
general theorem prover extremely difficult to create,it would
probably be too slow to be of any real use.) The system can recognize
simple inequality implications,for example “x < 1” implIEs “x < 2”;
otherwise the predicate condition must exactly match part of the
query’s WHERE condition or the index will not be recognized as usable.
Matching takes place at query planning time,not at run time. As a
result,parameterized query clauses do not work with a partial index.