从两个不同的表中按列排序时,PostgreSQL查询的速度会变慢

从两个不同的表中按列排序时,PostgreSQL查询的速度会变慢,第1张

概述以前,我使用了这个查询,速度很快: cb=# explain analyze SELECT "web_route"."id", "web_crag"."id" FROM "web_route" INNER JOIN "web_crag" ON ( "web_route"."crag_id" = "web_crag"."id" )WHERE "web_crag"."type" IN (1, 2) 以前,我使用了这个查询,速度很快:

cb=# explain analyze SELECT "web_route"."ID","web_crag"."ID" FROM "web_route" INNER JOIN "web_crag" ON ( "web_route"."crag_ID" = "web_crag"."ID" )WHERE "web_crag"."type" IN (1,2) ORDER BY "web_crag"."name" ASCliMIT 20;                                                                 query PLAN                                                                  --------------------------------------------------------------------------------------------------------------------------------------------- limit  (cost=0.00..2.16 rows=20 wIDth=18) (actual time=0.027..0.105 rows=20 loops=1)   ->  nested Loop  (cost=0.00..47088.94 rows=436055 wIDth=18) (actual time=0.026..0.100 rows=20 loops=1)         ->  Index Scan using web_crag_name on web_crag  (cost=0.00..503.16 rows=1776 wIDth=14) (actual time=0.011..0.020 rows=14 loops=1)               Filter: (type = ANY ('{1,2}'::integer[]))         ->  Index Scan using web_route_crag_ID on web_route  (cost=0.00..23.27 rows=296 wIDth=8) (actual time=0.004..0.005 rows=1 loops=14)               Index Cond: (crag_ID = web_crag.ID) Total runtime: 0.154 ms(7 rows)

查询的问题在于返回行的顺序不确定,这导致后续页面中的重复行产生OFFSETing(即分页在我的Web应用程序中无法正常工作).我决定通过“web_route”.ID“进行额外排序来使排序严格.

cb=# explain analyze SELECT "web_route"."ID",2)ORDER BY "web_crag"."name","web_route"."ID" ASC liMIT 20;                                                             query PLAN                                                             ------------------------------------------------------------------------------------------------------------------------------------ limit  (cost=29189.04..29189.09 rows=20 wIDth=18) (actual time=324.065..324.068 rows=20 loops=1)   ->  Sort  (cost=29189.04..30279.18 rows=436055 wIDth=18) (actual time=324.063..324.064 rows=20 loops=1)         Sort Key: web_crag.name,web_route.ID         Sort Method: top-N heapsort  Memory: 26kB         ->  Hash Join  (cost=135.40..17585.78 rows=436055 wIDth=18) (actual time=0.882..195.941 rows=435952 loops=1)               Hash Cond: (web_route.crag_ID = web_crag.ID)               ->  Seq Scan on web_route  (cost=0.00..10909.55 rows=436055 wIDth=8) (actual time=0.026..55.916 rows=435952 loops=1)               ->  Hash  (cost=113.20..113.20 rows=1776 wIDth=14) (actual time=0.848..0.848 rows=1775 loops=1)                     Buckets: 1024  Batches: 1  Memory Usage: 82kB                     ->  Seq Scan on web_crag  (cost=0.00..113.20 rows=1776 wIDth=14) (actual time=0.004..0.510 rows=1775 loops=1)                           Filter: (type = ANY ('{1,2}'::integer[])) Total runtime: 324.101 ms(12 rows)

但是,正如您所看到的,查询速度超过2000x,这是相当多的:).我想知道如果有的话可以做些什么.我打算做一个非常好的黑客并将“web_crag”.“name”复制到“web_route”中,以便我可以在两列(crag_name,ID)上放一个索引,但如果有更好的方法我会很高兴.

以下是“web_route”和“web_crag”的方案,以防万一.

cb=# \d web_crag;                                      table "public.web_crag"     Column      |           Type           |                       ModifIErs                       -----------------+--------------------------+------------------------------------------------------- ID              | integer                  | not null default nextval('web_crag_ID_seq'::regclass) name            | character varying(64)    | not null latitude        | double precision         |  longitude       | double precision         |  type            | integer                  |  description     | text                     | not null normalized_name | character varying(64)    | not null country_ID      | integer                  |  location_index  | character(24)            | not null added_by_ID     | integer                  |  date_created    | timestamp with time zone |  last_modifIEd   | timestamp with time zone | Indexes:    "web_crag_pkey" PRIMARY KEY,btree (ID)    "web_crag_added_by_ID" btree (added_by_ID)    "web_crag_country_ID" btree (country_ID)    "web_crag_location_index" btree (location_index)    "web_crag_name" btree (name)Foreign-key constraints:    "added_by_ID_refs_ID_1745ebe43b31bec6" FOREIGN KEY (added_by_ID) REFERENCES web_member(ID) DEFERRABLE INITIALLY DEFERRED    "country_ID_refs_ID_1384050a9bd763af" FOREIGN KEY (country_ID) REFERENCES web_country(ID) DEFERRABLE INITIALLY DEFERREDReferenced by:    table "web_route" CONSTRAINT "crag_ID_refs_ID_3ce1145606d12740" FOREIGN KEY (crag_ID) REFERENCES web_crag(ID) DEFERRABLE INITIALLY DEFERRED    table "web_vIDeo" CONSTRAINT "crag_ID_refs_ID_4fc9cbf2832725ca" FOREIGN KEY (crag_ID) REFERENCES web_crag(ID) DEFERRABLE INITIALLY DEFERRED    table "web_image" CONSTRAINT "crag_ID_refs_ID_58210dd331468848" FOREIGN KEY (crag_ID) REFERENCES web_crag(ID) DEFERRABLE INITIALLY DEFERRED    table "web_eventdestination" CONSTRAINT "crag_ID_refs_ID_612ad57c4d76c32c" FOREIGN KEY (crag_ID) REFERENCES web_crag(ID) DEFERRABLE INITIALLY DEFERREDTriggers:    set_crag_location_index BEFORE INSERT OR UPDATE ON web_crag FOR EACH ROW EXECUTE PROCEDURE set_crag_location_index()cb=# \d web_route                                        table "public.web_route"       Column       |           Type           |                       ModifIErs                        --------------------+--------------------------+-------------------------------------------------------- ID                 | integer                  | not null default nextval('web_route_ID_seq'::regclass) name               | character varying(64)    | not null crag_ID            | integer                  | not null sector             | character varying(64)    | not null difficulty         | character varying(16)    | not null author             | character varying(64)    | not null build_date         | character varying(32)    | not null description        | text                     | not null difficulty_numeric | integer                  |  length_meters      | double precision         |  added_by_ID        | integer                  |  date_created       | timestamp with time zone |  last_modifIEd      | timestamp with time zone |  normalized_name    | character varying(64)    | not null rating_Votes       | integer                  | not null rating_score       | integer                  | not nullIndexes:    "web_route_pkey" PRIMARY KEY,btree (ID)    "web_route_added_by_ID" btree (added_by_ID)    "web_route_crag_ID" btree (crag_ID)Check constraints:    "ck_rating_Votes_pstv_c39bae29f3b2012" CHECK (rating_Votes >= 0)    "web_route_rating_Votes_check" CHECK (rating_Votes >= 0)Foreign-key constraints:    "added_by_ID_refs_ID_157791930f5e12d5" FOREIGN KEY (added_by_ID) REFERENCES web_member(ID) DEFERRABLE INITIALLY DEFERRED    "crag_ID_refs_ID_3ce1145606d12740" FOREIGN KEY (crag_ID) REFERENCES web_crag(ID) DEFERRABLE INITIALLY DEFERRED
解决方法 遗憾的是,Postgresql还不擅长优化这些类型的排序,如果它找不到与sort子句完全匹配的索引,它总是希望立即对整个结果集进行排序.

从Postgresql 9.3开始,你可以欺骗规划者用LATERAL subquery做正确的事情.试试这个:

SELECT "web_route"."ID","web_crag"."ID"FROM "web_crag",LAteraL (    SELECT * FROM "web_route"    WHERE "web_route"."crag_ID" = "web_crag"."ID"    ORDER BY "web_route"."ID" ASC) AS "web_route"WHERE "web_crag"."type" IN (1,2)ORDER BY "web_crag"."name"liMIT 20;

我生成了一些简单的测试数据(100万web_crags,500万web_routes),这里是查询计划和时间……除了额外的web_route.ID排序外,几乎与您的第一个查询计划完全相同:

limit  (cost=24.36..120.70 rows=20 wIDth=14) (actual time=0.051..0.169 rows=20 loops=1)   ->  nested Loop  (cost=24.36..24084788.95 rows=5000000 wIDth=14) (actual time=0.049..0.143 rows=20 loops=1)         ->  Index Scan using web_crag_name_IDx on web_crag  (cost=0.42..39131.46 rows=1000000 wIDth=10) (actual time=0.018..0.023 rows=4 loops=1)               Filter: (type = ANY ('{1,2}'::integer[]))         ->  Sort  (cost=23.93..23.95 rows=5 wIDth=8) (actual time=0.018..0.021 rows=5 loops=4)               Sort Key: web_route.ID               Sort Method: quicksort  Memory: 25kB               ->  Index Scan using web_route_crag_ID_IDx on web_route  (cost=0.43..23.88 rows=5 wIDth=8) (actual time=0.005..0.011 rows=5 loops=4)                     Index Cond: (crag_ID = web_crag.ID) Total runtime: 0.212 ms

您可以使用web_route(crag_ID,ID)上的附加索引来避免排序:

limit  (cost=0.86..19.49 rows=20 wIDth=14) (actual time=0.031..0.113 rows=20 loops=1)   ->  nested Loop  (cost=0.86..4659293.82 rows=5000000 wIDth=14) (actual time=0.029..0.084 rows=20 loops=1)         ->  Index Scan using web_crag_name_IDx on web_crag  (cost=0.42..39293.82 rows=1000000 wIDth=10) (actual time=0.017..0.021 rows=4 loops=1)               Filter: (type = ANY ('{1,2}'::integer[]))         ->  Index Only Scan using web_route_crag_ID_ID_IDx on web_route  (cost=0.43..4.52 rows=5 wIDth=8) (actual time=0.005..0.009 rows=5 loops=4)               Index Cond: (crag_ID = web_crag.ID)               Heap Fetches: 0 Total runtime: 0.151 ms

这是我创建测试数据的方式:

create table web_crag(ID serial primary key,type int default 1,name text);create table web_route(ID serial primary key,crag_ID int);insert into web_crag (name) select generate_serIEs(1,1000000)::text;insert into web_route (crag_ID) select ID from web_crag cross join generate_serIEs(1,5);create index on web_crag(name);create index on web_route(crag_ID);analyze web_route;

Postgresql补丁

有一个“partial sort” patch to PostgreSQL可以自动进行大致这种优化,但遗憾的是它没有为Postgresql 9.4做出决定.希望Postgresql 9.5能够拥有它(大约在2015年下半年).

总结

以上是内存溢出为你收集整理的从两个不同的表中按列排序时,PostgreSQL查询的速度会变慢全部内容,希望文章能够帮你解决从两个不同的表中按列排序时,PostgreSQL查询的速度会变慢所遇到的程序开发问题。

如果觉得内存溢出网站内容还不错,欢迎将内存溢出网站推荐给程序员好友。

欢迎分享,转载请注明来源:内存溢出

原文地址: http://outofmemory.cn/sjk/1160196.html

(0)
打赏 微信扫一扫 微信扫一扫 支付宝扫一扫 支付宝扫一扫
上一篇 2022-06-01
下一篇 2022-06-01

发表评论

登录后才能评论

评论列表(0条)

保存