postgresql – 在timestamp列上按查询排序非常慢_sql

概述尽管有索引,并且id为主键,下面的查询大约需要15秒才能返回数据. select id from my_table order by insert_date offset 0 limit 1 解释分析如下 "Limit (cost=1766417.72..1766417.72 rows=1 width=12) (actual time=32479.440..32479.441 rows=1 lo 尽管有索引,并且ID为主键,下面的查询大约需要15秒才能返回数据.

select ID from my_table order by insert_date offset 0 limit 1

解释分析如下

"limit  (cost=1766417.72..1766417.72 rows=1 wIDth=12) (actual time=32479.440..32479.441 rows=1 loops=1)""  ->  Sort  (cost=1766417.72..1797117.34 rows=12279848 wIDth=12) (actual time=32479.437..32479.437 rows=1 loops=1)""        Sort Key: insert_date""        Sort Method: top-N heapsort  Memory: 25kB""        ->  Seq Scan on my_table  (cost=0.00..1705018.48 rows=12279848 wIDth=12) (actual time=0.006..21338.401 rows=12108916 loops=1)""Total runtime: 32479.476 ms"

我的表几乎没有其他专栏.但insert_date的类型是

insert_date timestamp without time zone NOT NulL DEFAulT Now(),

我在该特定日期列上有一个索引

CREATE INDEX my_table_insert_date_indx  ON my_table  USING btree  (insert_date)tableSPACE somexyz_IDx_ts;

postgresql.conf文件中的值很少：

shared_buffers = more than 1GB    ## just for an exampletemp_buffers = more than 1GBwork_mem = more than 1GBmaintenance_work_mem = more than 1GBdynamic_shared_memory_type = posixdefault_statistics_target = 10000autovacuum = onrandom_page_cost = 2.0cpu_index_tuple_cost = 0.0005

我现在正在使用postgres 9.3.

更新::

我刚刚运行了以下查询：

select insert_date,count(*) from my_table group by insert_date

而结果中的前几位是：

"2015-04-02 00:00:00";3718104"2015-04-03 00:00:00";6410253"2015-04-04 00:00:00";538247"2015-04-05 00:00:00";1228877"2015-04-06 00:00:00";131248

我在那张桌子上有大约1200万条记录.而上述数字几乎接近总数.

不确定,但是在一个有大量重复值的列上创建索引可能是一个问题吗？如果确实如此,那么我们有什么办法吗？

解决方法使用Postgresql 9.3和9.4,我的机器上的查询运行速度提高了大约160000倍.我的机器没什么特别的.

-- From Postgresql 9.4; 9.3 is similar.show shared_buffers; -- 128MBshow temp_buffers; -- 8MBshow work_mem; -- 4MBshow maintenance_work_mem; -- 64MBshow dynamic_shared_memory_type; -- posixshow default_statistics_target; -- 100show autovacuum; -- onshow random_page_cost; -- 4show cpu_index_tuple_cost; -- 0.005

制备

让我们建一张桌子. (你应该在你的问题中这样做.)

create table my_table (  ID serial primary key,insert_date timestamp not null);-- Round numbers of rows.insert into my_table(insert_date)select timestamp '2015-04-02 00:00:00'from generate_serIEs(1,3000000) n;insert into my_table(insert_date)select timestamp '2015-04-03 00:00:00'from generate_serIEs(1,6000000) n;insert into my_table(insert_date)select timestamp '2015-04-04 00:00:00'from generate_serIEs(1,500000) n;insert into my_table(insert_date)select timestamp '2015-04-05 00:00:00'from generate_serIEs(1,1200000) n;insert into my_table(insert_date)select timestamp '2015-04-06 00:00:00'from generate_serIEs(1,131000) n;

创建索引并更新统计信息.

create index on my_table (insert_date);analyze my_table;

Postgresql 9.4

现在,我们从您的第一个查询中获得了什么样的执行计划？

explain analyze select ID from my_table order by insert_date offset 0 limit 1;

"limit  (cost=0.43..0.48 rows=1 wIDth=12) (actual time=0.014..0.014 rows=1 loops=1)""  ->  Index Scan using my_table_insert_date_IDx on my_table  (cost=0.43..540656.27 rows=11200977 wIDth=12) (actual time=0.012..0.012 rows=1 loops=1)""Planning time: 0.195 ms""Execution time: 0.032 ms"

Postgresql 9.3

explain analyze select ID from my_table order by insert_date offset 0 limit 1;

"limit  (cost=0.43..0.47 rows=1 wIDth=12) (actual time=0.058..0.059 rows=1 loops=1)""  ->  Index Scan using my_table_insert_date_IDx on my_table  (cost=0.43..339814.36 rows=10830995 wIDth=12) (actual time=0.057..0.057 rows=1 loops=1)""Total runtime: 0.098 ms"

您的查询

select ID from my_table order by insert_date offset 0 limit 1;

是不确定的.根据ORDER BY子句,有300万行具有最低的insert_date(首先出现的日期).你挑选了300万中的一个. Postgresql不保证你每次都会得到相同的ID.

如果你不关心它返回的300万个ID中的哪一个,你可以用不同的方式表达查询.但我不认为表达不同会给你160k倍的加速.

您可以针对特定查询更改您包含的某些设置.所以你可以做这样的事情.

-- Don't commit or rollback . . . begin transaction;set local work_mem = '8 MB';explain analyze select ID from my_table order by insert_date offset 0 limit 1;-- displays the result.

手动提交或回滚.