18 Spark on EXPLAIN性能调优

18 Spark on EXPLAIN性能调优,第1张

18 Spark on EXPLAIN性能调优

Description
The EXPLAIN statement is used to provide logical/physical plans for an input statement. By default, this clause provides information about a physical plan only.
EXPLAIN语句用于为输入语句提供逻辑/物理计划。默认情况下,此子句仅提供有关物理计划的信息。

Syntax
EXPLAIN [ EXTENDED | CODEGEN | COST | FORMATTED ] statement


Generates parsed logical plan, analyzed logical plan, optimized logical plan and physical plan. Parsed Logical plan is a unresolved plan that extracted from the query. Analyzed logical plans transforms which translates unresolvedAttribute and unresolvedRelation into fully typed objects. The optimized logical plan transforms through a set of optimization rules, resulting in the physical plan.

生成分析的逻辑计划、分析的逻辑计划、优化的逻辑计划和物理计划。已解析的逻辑计划是从查询中提取的未解析计划。分析了将UnsolvedAttribute和UnsolvedRelationship转换为全类型对象的逻辑计划转换。优化的逻辑计划通过一组优化规则进行转换,从而生成物理计划。

CODEGEN

Generates code for the statement, if any and a physical plan.

展示要 Codegen 生成的可执行 Java 代码。

COST

If plan node statistics are available, generates a logical plan and the statistics.

展示优化后的逻辑执行计划以及相关的统计。

FORMATTED

Spark 3.0 大版本发布,Spark SQL 的优化占比将近 50%。Spark SQL 取代 Spark Core,成为新一代的引擎内核,所有其他子框架如 Mllib、Streaming 和 Graph,都可以共享 Spark SQL 的性能优化,都能从 Spark 社区对于 Spark SQL 的投入中受益。

 

 以官网
​​​​​​​

 

Generates two sections: a physical plan outline and node details.

以分隔的方式输出,它会输出更易读的物理执行计划,并展示每个节点的详细信息。

statement

Specifies a SQL statement to be explained.

Spark 3.0 大版本发布,Spark SQL 的优化占比将近 50%。Spark SQL 取代 Spark Core,成为新一代的引擎内核,所有其他子框架如 Mllib、Streaming 和 Graph,都可以共享 Spark SQL 的性能优化,都能从 Spark 社区对于 Spark SQL 的投入中受益。

-- Using Extended
EXPLAIN EXTENDED select k, sum(v) from values (1, 2), (1, 3) t(k, v) group by k;
+----------------------------------------------------+
|                                                plan|
+----------------------------------------------------+
| == Parsed Logical Plan ==
 'Aggregate ['k], ['k, unresolvedalias('sum('v), None)]
 +- 'SubqueryAlias `t`
    +- 'UnresolvedInlineTable [k, v], [List(1, 2), List(1, 3)]
   
 == Analyzed Logical Plan ==
 k: int, sum(v): bigint
 Aggregate [k#47], [k#47, sum(cast(v#48 as bigint)) AS sum(v)#50L]
 +- SubqueryAlias `t`
    +- LocalRelation [k#47, v#48]
   
 == Optimized Logical Plan ==
 Aggregate [k#47], [k#47, sum(cast(v#48 as bigint)) AS sum(v)#50L]
 +- LocalRelation [k#47, v#48]
   
 == Physical Plan ==
 *(2) HashAggregate(keys=[k#47], functions=[sum(cast(v#48 as bigint))], output=[k#47, sum(v)#50L])
+- Exchange hashpartitioning(k#47, 200), true, [id=#79]
   +- *(1) HashAggregate(keys=[k#47], functions=[partial_sum(cast(v#48 as bigint))], output=[k#47, sum#52L])
    +- *(1) LocalTableScan [k#47, v#48]
|
+----------------------------------------



|==已解析的逻辑计划==
'合计['k],'k,未解决别名('sum('v),无)]
+-“SubqueryAlias”t`
+-'unsolvedinlinetable[k,v],[List(1,2),List(1,3)]

==分析的逻辑计划==
k:int,和(v):bigint
聚合[k#47],[k#47,总和(铸造(v#48为bigint))为总和(v)#50L]
+-亚Queryalias`t`
+-局部关系[k#47,v#48]

==成本优化优化的逻辑计划==
聚合[k#47],[k#47,总和(铸造(v#48为bigint))为总和(v)#50L]
+-局部关系[k#47,v#48]

==物理实际计划==
*(2) HashAggregate(键=[k#47],函数=[sum(cast(v#48为bigint))],输出=[k#47,sum(v)#50L])
+-交换hashpartitioning(k#47200),true[id=#79]
+-*(1)HashAggregate(键=[k#47],函数=[partial#u sum(cast(v#48作为bigint))],输出=[k#47,sum#52L])
+-*(1)LocalTableScan[k#47,v#48]

欢迎分享,转载请注明来源:内存溢出

原文地址: https://outofmemory.cn/zaji/5702163.html

(0)
打赏 微信扫一扫 微信扫一扫 支付宝扫一扫 支付宝扫一扫
上一篇 2022-12-17
下一篇 2022-12-17

发表评论

登录后才能评论

评论列表(0条)

保存