Spark SQL 内置函数(三)Date and Timestamp Functions(基于 Spark 3.2.0)

Spark SQL 内置函数(三)Date and Timestamp Functions(基于 Spark 3.2.0),第1张

Spark SQL 内置函数(三)Date and Timestamp Functions(基于 Spark 3.2.0) 前言

本文隶属于专栏《1000个问题搞定大数据技术体系》,该专栏为笔者原创,引用请注明来源,不足和错误之处请在评论区帮忙指出,谢谢!

本专栏目录结构和参考文献请见1000个问题搞定大数据技术体系

正文 add_months(start_date, num_months) 描述

返回start_date 加上 num_months 月后的日期。

实践
SELECt add_months('2016-08-31', 1);
+-------------------------+
|add_months(2016-08-31, 1)|
+-------------------------+
|               2016-09-30|
+-------------------------+
current_date() 描述
  • 返回查询评估起始阶段的当前日期。
  • 同一个查询所有的 current_date 返回的值都是相同的。
实践
SELECT current_date();
+--------------+
|current_date()|
+--------------+
|    2021-10-06|
+--------------+
current_date 描述
  • 返回查询评估起始阶段的当前日期。
实践
SELECT current_date;
+--------------+
|current_date()|
+--------------+
|    2021-10-06|
+--------------+
current_timestamp() 描述
  • 返回查询评估起始阶段的当前时间戳。
  • 同一个查询所有的 current_timestamp 返回的值都是相同的。
实践
SELECT current_timestamp();
+--------------------+
| current_timestamp()|
+--------------------+
|2021-10-06 14:13:...|
+--------------------+
current_timestamp 描述
  • 返回查询评估起始阶段的当前时间戳。
实践
SELECT current_timestamp;
+--------------------+
| current_timestamp()|
+--------------------+
|2021-10-06 14:13:...|
+--------------------+
current_timezone() 描述

返回当前 session 的本地时区。

实践
SELECT current_timezone();
+------------------+
|current_timezone()|
+------------------+
|           Etc/UTC|
+------------------+
date_add(start_date, num_days) 描述

返回start_date 加上 num_days 天后的日期。

实践
SELECT date_add('2016-07-30', 1);
+-----------------------+
|date_add(2016-07-30, 1)|
+-----------------------+
|             2016-07-31|
+-----------------------+
date_format(timestamp, fmt) 描述

基于特定的日期格式fmt转换时间戳 timestamp 为字符串。

实践
SELECT date_format('2016-04-08', 'y');
+--------------------------+
|date_format(2016-04-08, y)|
+--------------------------+
|                      2016|
+--------------------------+
date_from_unix_date(days) 描述
  • 创建一个日期
  • 这个日期从 1970-01-01 开始过了 days 天。
实践
SELECt date_from_unix_date(1);
+----------------------+
|date_from_unix_date(1)|
+----------------------+
|            1970-01-02|
+----------------------+
date_part(field, source) 描述
  • 抽取 source 中的一部分。
  • source 代表日期、时间戳或者时间间隔 。
实践
SELECt date_part('YEAR', TIMESTAMP '2019-08-12 01:00:00.123456');
+-------------------------------------------------------+
|date_part(YEAR, TIMESTAMP '2019-08-12 01:00:00.123456')|
+-------------------------------------------------------+
|                                                   2019|
+-------------------------------------------------------+

SELECT date_part('week', timestamp'2019-08-12 01:00:00.123456');
+-------------------------------------------------------+
|date_part(week, TIMESTAMP '2019-08-12 01:00:00.123456')|
+-------------------------------------------------------+
|                                                     33|
+-------------------------------------------------------+

SELECT date_part('doy', DATE'2019-08-12');
+---------------------------------+
|date_part(doy, DATE '2019-08-12')|
+---------------------------------+
|                              224|
+---------------------------------+

SELECT date_part('SECONDS', timestamp'2019-10-01 00:00:01.000001');
+----------------------------------------------------------+
|date_part(SECONDS, TIMESTAMP '2019-10-01 00:00:01.000001')|
+----------------------------------------------------------+
|                                                  1.000001|
+----------------------------------------------------------+

SELECT date_part('days', interval 5 days 3 hours 7 minutes);
+-------------------------------------------------+
|date_part(days, INTERVAL '5 03:07' DAY TO MINUTE)|
+-------------------------------------------------+
|                                                5|
+-------------------------------------------------+

SELECT date_part('seconds', interval 5 hours 30 seconds 1 milliseconds 1 microseconds);
+-------------------------------------------------------------+
|date_part(seconds, INTERVAL '05:00:30.001001' HOUR TO SECOND)|
+-------------------------------------------------------------+
|                                                    30.001001|
+-------------------------------------------------------------+

SELECT date_part('MONTH', INTERVAL '2021-11' YEAR TO MONTH);
+--------------------------------------------------+
|date_part(MONTH, INTERVAL '2021-11' YEAR TO MONTH)|
+--------------------------------------------------+
|                                                11|
+--------------------------------------------------+

SELECT date_part('MINUTE', INTERVAL '123 23:55:59.002001' DAY TO SECOND);
+---------------------------------------------------------------+
|date_part(MINUTE, INTERVAL '123 23:55:59.002001' DAY TO SECOND)|
+---------------------------------------------------------------+
|                                                             55|
+---------------------------------------------------------------+

doy 代表一年中的第几天。
dow 代表一周中的第几天。

date_sub(start_date, num_days) 描述

返回start_date 减去 num_days 天后的日期。

实践
SELECT date_sub('2016-07-30', 1);
+-----------------------+
|date_sub(2016-07-30, 1)|
+-----------------------+
|             2016-07-29|
+-----------------------+
date_trunc(fmt, ts) 描述
  • 针对时间 ts 执行截取 *** 作。
  • 截取到指定的单位 fmt 。
实践
SELECT date_trunc('YEAR', '2015-03-05T09:32:05.359');
+-----------------------------------------+
|date_trunc(YEAR, 2015-03-05T09:32:05.359)|
+-----------------------------------------+
|                      2015-01-01 00:00:00|
+-----------------------------------------+

SELECT date_trunc('MM', '2015-03-05T09:32:05.359');
+---------------------------------------+
|date_trunc(MM, 2015-03-05T09:32:05.359)|
+---------------------------------------+
|                    2015-03-01 00:00:00|
+---------------------------------------+

SELECT date_trunc('DD', '2015-03-05T09:32:05.359');
+---------------------------------------+
|date_trunc(DD, 2015-03-05T09:32:05.359)|
+---------------------------------------+
|                    2015-03-05 00:00:00|
+---------------------------------------+

SELECT date_trunc('HOUR', '2015-03-05T09:32:05.359');
+-----------------------------------------+
|date_trunc(HOUR, 2015-03-05T09:32:05.359)|
+-----------------------------------------+
|                      2015-03-05 09:00:00|
+-----------------------------------------+

SELECT date_trunc('MILLISECOND', '2015-03-05T09:32:05.123456');
+---------------------------------------------------+
|date_trunc(MILLISECOND, 2015-03-05T09:32:05.123456)|
+---------------------------------------------------+
|                               2015-03-05 09:32:...|
+---------------------------------------------------+
datediff(endDate, startDate) 描述

返回从 startDate 到 endDate 之间的天数。

实践
SELECT datediff('2009-07-31', '2009-07-30');
+--------------------------------+
|datediff(2009-07-31, 2009-07-30)|
+--------------------------------+
|                               1|
+--------------------------------+

SELECT datediff('2009-07-30', '2009-07-31');
+--------------------------------+
|datediff(2009-07-30, 2009-07-31)|
+--------------------------------+
|                              -1|
+--------------------------------+
day(date) 描述

返回日期或者时间戳在所在月份是第几天。

实践
SELECT day('2009-07-30');
+---------------+
|day(2009-07-30)|
+---------------+
|             30|
+---------------+
dayofmonth(date) 描述

返回日期或者时间戳在所在月份是第几天。

实践
SELECT dayofmonth('2009-07-30');
+----------------------+
|dayofmonth(2009-07-30)|
+----------------------+
|                    30|
+----------------------+

day(date)和 dayofmonth(date) 是一样的。

dayofweek(date) 描述
  • 返回日期或者时间戳在所在星期是第几天。
  • 0 代表星期天,6 代表星期六。
实践
SELECT dayofweek('2009-07-30');
+---------------------+
|dayofweek(2009-07-30)|
+---------------------+
|                    5|
+---------------------+
dayofyear(date) 描述
  • 返回日期或者时间戳在所在年份是第几天。
实践
SELECT dayofyear('2016-04-09');
+---------------------+
|dayofyear(2016-04-09)|
+---------------------+
|                  100|
+---------------------+
extract(field FROM source) 描述
  • 抽取 source 中的一部分。
  • source 代表日期、时间戳或者时间间隔 。
实践
SELECt extract(YEAR FROM TIMESTAMP '2019-08-12 01:00:00.123456');
+---------------------------------------------------------+
|extract(YEAR FROM TIMESTAMP '2019-08-12 01:00:00.123456')|
+---------------------------------------------------------+
|                                                     2019|
+---------------------------------------------------------+

SELECt extract(week FROM timestamp'2019-08-12 01:00:00.123456');
+---------------------------------------------------------+
|extract(week FROM TIMESTAMP '2019-08-12 01:00:00.123456')|
+---------------------------------------------------------+
|                                                       33|
+---------------------------------------------------------+

SELECt extract(doy FROM DATE'2019-08-12');
+-----------------------------------+
|extract(doy FROM DATE '2019-08-12')|
+-----------------------------------+
|                                224|
+-----------------------------------+

SELECt extract(SEConDS FROM timestamp'2019-10-01 00:00:01.000001');
+------------------------------------------------------------+
|extract(SEConDS FROM TIMESTAMP '2019-10-01 00:00:01.000001')|
+------------------------------------------------------------+
|                                                    1.000001|
+------------------------------------------------------------+

SELECt extract(days FROM interval 5 days 3 hours 7 minutes);
+---------------------------------------------------+
|extract(days FROM INTERVAL '5 03:07' DAY TO MINUTE)|
+---------------------------------------------------+
|                                                  5|
+---------------------------------------------------+

SELECt extract(seconds FROM interval 5 hours 30 seconds 1 milliseconds 1 microseconds);
+---------------------------------------------------------------+
|extract(seconds FROM INTERVAL '05:00:30.001001' HOUR TO SECOND)|
+---------------------------------------------------------------+
|                                                      30.001001|
+---------------------------------------------------------------+

SELECt extract(MONTH FROM INTERVAL '2021-11' YEAR TO MONTH);
+----------------------------------------------------+
|extract(MONTH FROM INTERVAL '2021-11' YEAR TO MONTH)|
+----------------------------------------------------+
|                                                  11|
+----------------------------------------------------+

SELECt extract(MINUTE FROM INTERVAL '123 23:55:59.002001' DAY TO SECOND);
+-----------------------------------------------------------------+
|extract(MINUTE FROM INTERVAL '123 23:55:59.002001' DAY TO SECOND)|
+-----------------------------------------------------------------+
|                                                               55|
+-----------------------------------------------------------------+

date_part(field, source) 和 extract(field FROM source) 是一样的用法。

from_unixtime(unix_time[, fmt]) 描述

返回特定日期格式 fmt 的 unix_time 。

实践
SELECt from_unixtime(0, 'yyyy-MM-dd HH:mm:ss');
+-------------------------------------+
|from_unixtime(0, yyyy-MM-dd HH:mm:ss)|
+-------------------------------------+
|                  1970-01-01 00:00:00|
+-------------------------------------+

SELECt from_unixtime(0);
+-------------------------------------+
|from_unixtime(0, yyyy-MM-dd HH:mm:ss)|
+-------------------------------------+
|                  1970-01-01 00:00:00|
+-------------------------------------+
from_utc_timestamp(timestamp, timezone) 描述
  • 给定一个时间戳 timestamp,基于 UTC 方式和给定的时区 timezone 来解释成时间。
  • 比如:from_utc_timestamp(‘2017-07-14 02:40:00.0’ ,‘GMT+1’ )将会返回 ‘2017-07-14 03:40:00’
实践
SELECt from_utc_timestamp('2016-08-31', 'Asia/Seoul');
+------------------------------------------+
|from_utc_timestamp(2016-08-31, Asia/Seoul)|
+------------------------------------------+
|                       2016-08-31 09:00:00|
+------------------------------------------+

SELECt from_utc_timestamp('2021-11-25', 'Asia/Shanghai');

+------------------------------------------+
|from_utc_timestamp(2021-11-25, Asia/Shanghai)|
+------------------------------------------+
|                       2021-11-25 08:00:00|
+------------------------------------------+

SELECt from_utc_timestamp('2017-07-14 02:40:00.0' ,'GMT+1' );

+----------------------------------------------------+
|from_utc_timestamp('2017-07-14 02:40:00.0' ,'GMT+1')|
+----------------------------------------------------+
|                       		  2017-07-14 03:40:00|
+----------------------------------------------------+
hour(timestamp) 描述

返回字符串或者时间戳的小时部分。

实践
SELECt hour('2009-07-30 12:58:59');
+-------------------------+
|hour(2009-07-30 12:58:59)|
+-------------------------+
|                       12|
+-------------------------+


SELECT hour(current_timestamp());
+-------------------------+
|hour(current_timestamp())|
+-------------------------+
|                       22|
+-------------------------+
last_day(date) 描述

返回日期所在月份的最后一天。

实践
SELECT last_day('2009-01-12');
+--------------------+
|last_day(2009-01-12)|
+--------------------+
|          2009-01-31|
+--------------------+
make_date(year, month, day) 描述

根据年月日来创建日期。

实践
SELECT make_date(2013, 7, 15);
+----------------------+
|make_date(2013, 7, 15)|
+----------------------+
|            2013-07-15|
+----------------------+

SELECT make_date(2019, 13, 1);
+----------------------+
|make_date(2019, 13, 1)|
+----------------------+
|                  null|
+----------------------+

SELECT make_date(2019, 7, NULL);
+------------------------+
|make_date(2019, 7, NULL)|
+------------------------+
|                    null|
+------------------------+

SELECT make_date(2019, 2, 30);
+----------------------+
|make_date(2019, 2, 30)|
+----------------------+
|                  null|
+----------------------+
make_dt_interval([days[, hours[, mins[, secs]]]]) 描述

基于 天、小时、分、秒 来创建一个时间间隔。

实践
SELECT make_dt_interval(1, 12, 30, 01.001001);
+-------------------------------------+
|make_dt_interval(1, 12, 30, 1.001001)|
+-------------------------------------+
|                INTERVAL '1 12:30...'|
+-------------------------------------+


SELECT make_dt_interval(2);
+-----------------------------------+
|make_dt_interval(2, 0, 0, 0.000000)|
+-----------------------------------+
|              INTERVAL '2 00:00...'|
+-----------------------------------+


SELECT make_dt_interval(100, null, 3);
+----------------------------------------+
|make_dt_interval(100, NULL, 3, 0.000000)|
+----------------------------------------+
|                                    null|
+----------------------------------------+
make_interval([years[, months[, weeks[, days[, hours[, mins[, secs]]]]]]]) 描述

基于 年、月、星期、日、时、分、秒 来创建一个时间间隔。

实践
SELECT make_interval(100, 11, 1, 1, 12, 30, 01.001001);
+----------------------------------------------+
|make_interval(100, 11, 1, 1, 12, 30, 1.001001)|
+----------------------------------------------+
|                          100 years 11 mont...|
+----------------------------------------------+

SELECT make_interval(100, null, 3);
+----------------------------------------------+
|make_interval(100, NULL, 3, 0, 0, 0, 0.000000)|
+----------------------------------------------+
|                                          null|
+----------------------------------------------+

SELECT make_interval(0, 1, 0, 1, 0, 0, 100.000001);
+-------------------------------------------+
|make_interval(0, 1, 0, 1, 0, 0, 100.000001)|
+-------------------------------------------+
|                       1 months 1 days 1...|
+-------------------------------------------+
make_timestamp(year, month, day, hour, min, sec[, timezone]) 描述
  • 基于 年、月、日、时、分、秒、时区来创建一个时间戳。
  • 结果数据类型和配置 spark.sql.timestampType 定义的类型一致。
实践
ELECT make_timestamp(2014, 12, 28, 6, 30, 45.887);
+-------------------------------------------+
|make_timestamp(2014, 12, 28, 6, 30, 45.887)|
+-------------------------------------------+
|                       2014-12-28 06:30:...|
+-------------------------------------------+

SELECT make_timestamp(2014, 12, 28, 6, 30, 45.887, 'CET');
+------------------------------------------------+
|make_timestamp(2014, 12, 28, 6, 30, 45.887, CET)|
+------------------------------------------------+
|                            2014-12-28 05:30:...|
+------------------------------------------------+

SELECT make_timestamp(2019, 6, 30, 23, 59, 60);
+---------------------------------------+
|make_timestamp(2019, 6, 30, 23, 59, 60)|
+---------------------------------------+
|                    2019-07-01 00:00:00|
+---------------------------------------+

SELECT make_timestamp(2019, 6, 30, 23, 59, 1);
+--------------------------------------+
|make_timestamp(2019, 6, 30, 23, 59, 1)|
+--------------------------------------+
|                   2019-06-30 23:59:01|
+--------------------------------------+

SELECT make_timestamp(2019, 13, 1, 10, 11, 12, 'PST');
+--------------------------------------------+
|make_timestamp(2019, 13, 1, 10, 11, 12, PST)|
+--------------------------------------------+
|                                        null|
+--------------------------------------------+

SELECT make_timestamp(null, 7, 22, 15, 30, 0);
+--------------------------------------+
|make_timestamp(NULL, 7, 22, 15, 30, 0)|
+--------------------------------------+
|                                  null|
+--------------------------------------+
make_ym_interval([years[, months]]) 描述

创建一个基于 年、月 的时间间隔。

实践
SELECT make_ym_interval(1, 2);
+----------------------+
|make_ym_interval(1, 2)|
+----------------------+
|  INTERVAL '1-2' YE...|
+----------------------+

SELECT make_ym_interval(1, 0);
+----------------------+
|make_ym_interval(1, 0)|
+----------------------+
|  INTERVAL '1-0' YE...|
+----------------------+

SELECT make_ym_interval(-1, 1);
+-----------------------+
|make_ym_interval(-1, 1)|
+-----------------------+
|   INTERVAL '-0-11' ...|
+-----------------------+

SELECT make_ym_interval(2);
+----------------------+
|make_ym_interval(2, 0)|
+----------------------+
|  INTERVAL '2-0' YE...|
+----------------------+
minute(timestamp) 描述

返回字符串或者时间戳的分钟部分。

实践
SELECT minute('2009-07-30 12:58:59');
+---------------------------+
|minute(2009-07-30 12:58:59)|
+---------------------------+
|                         58|
+---------------------------+

SELECT minute(current_timestamp());
+---------------------------+
|minute(current_timestamp())|
+---------------------------+
|                         50|
+---------------------------+
month(date) 描述

返回字符串或者时间戳的月份部分。

实践
SELECT month('2016-07-30');
+-----------------+
|month(2016-07-30)|
+-----------------+
|                7|
+-----------------+

SELECT month(current_timestamp());
+---------------------------+
| month(current_timestamp())|
+---------------------------+
|                         11|
+---------------------------+
months_between(timestamp1, timestamp2[, roundOff]) 描述
  • 如果 timestamp1 晚于timestamp2,则结果为正。
  • 如果 timestamp1 和timestamp2 都是处于所在月份的同一天或者都是所在月份的最后一天,那么这天时间会被忽略。
  • 除非 roundOff=false , 否则的话,区别会基于每月 31 月,并且四舍五入到 8 位。
实践
SELECT months_between('1997-02-28 10:30:00', '1996-10-30');
+-----------------------------------------------------+
|months_between(1997-02-28 10:30:00, 1996-10-30, true)|
+-----------------------------------------------------+
|                                           3.94959677|
+-----------------------------------------------------+

SELECT months_between('1997-02-28 10:30:00', '1996-10-30', false);
+------------------------------------------------------+
|months_between(1997-02-28 10:30:00, 1996-10-30, false)|
+------------------------------------------------------+
|                                    3.9495967741935485|
+------------------------------------------------------+

SELECT months_between('1997-03-30', '1996-10-30');
+-----------------------------------------------------+
|months_between('1997-03-30', '1996-10-30'			  |
+-----------------------------------------------------+
|                                           	   5.0|
+-----------------------------------------------------+

SELECT months_between('1997-03-30 10:30:00', '1996-10-30');
+-----------------------------------------------------+
|months_between('1997-03-30 10:30:00', '1996-10-30')  |
+-----------------------------------------------------+
|                                           	   5.0|
+-----------------------------------------------------+
next_day(start_date, day_of_week) 描述
  • 返回第一个日期,该日期晚于 start_date ,并按指示命名。
  • 如果至少有一个输入参数为NULL,则函数返回NULL。
  • 如果两个输入参数都不为NULL,并且 day_of_week 为无效输入
    • 如果’spark.sql.ansi.enabled’设置为true,则函数将抛出IllegalArgumentException
    • 否则为NULL。
实践
SELECT next_day('2015-01-14', 'TU');
+------------------------+
|next_day(2015-01-14, TU)|
+------------------------+
|              2015-01-20|
+------------------------+

SELECT next_day('2015-01-14', NULL);
+--------------------------+
|next_day(2015-01-14, NULL)|
+--------------------------+
|              	     NULL  |
+--------------------------+

SELECT next_day('2015-01-14', 1);
+--------------------------+
|next_day(2015-01-14, NULL)|
+--------------------------+
|              	     NULL  |
+--------------------------+

1、星期日:Sunday [ˈsʌndi]],英文缩写:SUN/SU。
2、星期一:Monday [ˈmʌndi],英文缩写:MON/MO。
3、星期二:Tuesday [ˈtju:zdi],英文缩写:TUE/TU。
4、星期三:Wednesday [ˈwenzdi],英文缩写:WED/WE。
5、星期四:Thursday [ˈθə:zdi]],英文缩写:THU/TH。
6、星期五:Friday [ˈfraidi],英文缩写:FRI/FR。
7、星期六:Saturday [ˈsætədi]],英文缩写:SAT/SA。

now() 描述

返回查询评估起始阶段的当前时间戳。

实践
SELECT now();
+-----------------------+
|                  now()|
+-----------------------+
|2021-11-25 23:15:23.389|
+-----------------------+
quarter(date) 描述
  • 返回当前所在一年的四分之几。
  • 返回值 1,2,3,4。
实践
SELECT quarter('2016-08-31');
+-------------------+
|quarter(2016-08-31)|
+-------------------+
|                  3|
+-------------------+

SELECT quarter(now());
+-------------------+
|	  quarter(now())|
+-------------------+
|                  4|
+-------------------+
second(timestamp) 描述

返回字符串或者时间戳的秒部分。

实践
SELECT second('2009-07-30 12:58:59');
+---------------------------+
|second(2009-07-30 12:58:59)|
+---------------------------+
|                         59|
+---------------------------+

SELECT second(now());
+---------------------------+
|second(now())|
+---------------------------+
|                          2|
+---------------------------+
session_window(time_column, gap_duration) 描述
  • Spark Structured Streaming 中使用。
  • 通过指定时间字段和间隔周期生成会话窗口。
实践
SELECT a, session_window.start, session_window.end, count(*) as cnt FROM VALUES ('A1', '2021-01-01 00:00:00'), ('A1', '2021-01-01 00:04:30'), ('A1', '2021-01-01 00:10:00'), ('A2', '2021-01-01 00:01:00') AS tab(a, b) GROUP by a, session_window(b, '5 minutes') ORDER BY a, start;
+---+-------------------+-------------------+---+
|  a|              start|                end|cnt|
+---+-------------------+-------------------+---+
| A1|2021-01-01 00:00:00|2021-01-01 00:09:30|  2|
| A1|2021-01-01 00:10:00|2021-01-01 00:15:00|  1|
| A2|2021-01-01 00:01:00|2021-01-01 00:06:00|  1|
+---+-------------------+-------------------+---+

SELECt a, session_window.start, session_window.end, count(*) as cnt FROM VALUES ('A1', '2021-01-01 00:00:00'), ('A1', '2021-01-01 00:04:30'), ('A1', '2021-01-01 00:10:00'), ('A2', '2021-01-01 00:01:00'), ('A2', '2021-01-01 00:04:30') AS tab(a, b) GROUP by a, session_window(b, CASE WHEN a = 'A1' THEN '5 minutes' WHEN a = 'A2' THEN '1 minute' ELSE '10 minutes' END) ORDER BY a, start;
+---+-------------------+-------------------+---+
|  a|              start|                end|cnt|
+---+-------------------+-------------------+---+
| A1|2021-01-01 00:00:00|2021-01-01 00:09:30|  2|
| A1|2021-01-01 00:10:00|2021-01-01 00:15:00|  1|
| A2|2021-01-01 00:01:00|2021-01-01 00:02:00|  1|
| A2|2021-01-01 00:04:30|2021-01-01 00:05:30|  1|
+---+-------------------+-------------------+---+
timestamp_micros(microseconds) 描述

从 UTC 历元后的微秒数创建时间戳。

实践
SELECt timestamp_micros(1230219000123123);
+----------------------------------+
|timestamp_micros(1230219000123123)|
+----------------------------------+
|              2008-12-25 15:30:...|
+----------------------------------+
timestamp_millis(milliseconds) 描述

从 UTC 历元后的毫秒数创建时间戳。

实践
SELECT timestamp_millis(1230219000123);
+-------------------------------+
|timestamp_millis(1230219000123)|
+-------------------------------+
|           2008-12-25 15:30:...|
+-------------------------------+
timestamp_seconds(seconds) 描述
  • 从 UTC 历元后的秒数创建时间戳。
  • 这个数值可以很小。
实践
SELECT timestamp_seconds(1230219000);
+-----------------------------+
|timestamp_seconds(1230219000)|
+-----------------------------+
|          2008-12-25 15:30:00|
+-----------------------------+

SELECT timestamp_seconds(1230219000.123);
+---------------------------------+
|timestamp_seconds(1230219000.123)|
+---------------------------------+
|             2008-12-25 15:30:...|
+---------------------------------+

SELECT timestamp_seconds(0.00123);
+---------------------------------+
|timestamp_seconds(1230219000.123)|
+---------------------------------+
|  		 1970-01-01 08:00:00.00123|
+---------------------------------+
to_date(date_str[, fmt]) 描述
  • 将日期字符串 date_str 解析成指定的日期字符串格式fmt。
  • 无效输入返回 NULL。
  • 默认情况下,如果 fmt 没有设置,则会强制只保留日期。
实践
SELECT to_date('2009-07-30 04:17:52');
+----------------------------+
|to_date(2009-07-30 04:17:52)|
+----------------------------+
|                  2009-07-30|
+----------------------------+

SELECT to_date('2016-12-31', 'yyyy-MM-dd');
+-------------------------------+
|to_date(2016-12-31, yyyy-MM-dd)|
+-------------------------------+
|                     2016-12-31|
+-------------------------------+
to_timestamp(timestamp_str[, fmt]) 描述
  • 将时间戳字符串 timestamp_str 解析成指定的日期字符串格式fmt。
  • 无效输入返回 NULL。
  • 默认情况下,如果 fmt 没有设置,则会强制只保留日期。
  • 结果数据类型和配置spark.sql.timestampType 保持一致。
实践
SELECT to_timestamp('2016-12-31 00:12:00');
+---------------------------------+
|to_timestamp(2016-12-31 00:12:00)|
+---------------------------------+
|              2016-12-31 00:12:00|
+---------------------------------+

SELECT to_timestamp('2016-12-31', 'yyyy-MM-dd');
+------------------------------------+
|to_timestamp(2016-12-31, yyyy-MM-dd)|
+------------------------------------+
|                 2016-12-31 00:00:00|
+------------------------------------+
to_unix_timestamp(timeExp[, fmt]) 描述

返回给定时间的 UNIX 时间戳。

实践
SELECT to_unix_timestamp('2016-04-08', 'yyyy-MM-dd');
+-----------------------------------------+
|to_unix_timestamp(2016-04-08, yyyy-MM-dd)|
+-----------------------------------------+
|                               1460073600|
+-----------------------------------------+
to_utc_timestamp(timestamp, timezone) 描述
  • 给定一个时间戳 timestamp,基于 UTC 方式和给定的时区 timezone 来解释成时间。
  • 比如:to_utc_timestamp(‘2017-07-14 02:40:00.0’ ,‘GMT+1’ )将会返回 ‘2017-07-14 01:40:00’
实践
SELECT to_utc_timestamp('2016-08-31', 'Asia/Seoul');
+------------------------------------------+
|  to_utc_timestamp(2016-08-31, Asia/Seoul)|
+------------------------------------------+
|                       2016-08-30 15:00:00|
+------------------------------------------+

SELECT to_utc_timestamp('2021-11-25', 'Asia/Shanghai');

+-------------------------------------------+
|to_utc_timestamp(2021-11-25, Asia/Shanghai)|
+-------------------------------------------+
|                        2021-11-24 16:00:00|
+-------------------------------------------+

SELECT to_utc_timestamp('2017-07-14 02:40:00.0' ,'GMT+1' );

+----------------------------------------------------+
|  to_utc_timestamp('2017-07-14 02:40:00.0' ,'GMT+1')|
+----------------------------------------------------+
|                       		  2017-07-14 01:40:00|
+----------------------------------------------------+

建议和上面的 from_utc_timestamp(timestamp, timezone) 结合起来理解。

trunc(date, fmt) 描述

date 截取到指定的时间单位 fmt 的第一天。

实践
SELECt trunc('2019-08-04', 'week');
+-----------------------+
|trunc(2019-08-04, week)|
+-----------------------+
|             2019-07-29|
+-----------------------+

SELECT trunc('2019-08-04', 'quarter');
+--------------------------+
|trunc(2019-08-04, quarter)|
+--------------------------+
|                2019-07-01|
+--------------------------+

SELECT trunc('2009-02-12', 'MM');
+---------------------+
|trunc(2009-02-12, MM)|
+---------------------+
|           2009-02-01|
+---------------------+

SELECT trunc('2015-10-27', 'YEAR');
+-----------------------+
|trunc(2015-10-27, YEAR)|
+-----------------------+
|             2015-01-01|
+-----------------------+
unix_date(date) 描述

返回从 1970-01-01 到 date 过了多少天。

实践
SELECT unix_date(DATE("1970-01-02"));
+---------------------+
|unix_date(1970-01-02)|
+---------------------+
|                    1|
+---------------------+
unix_micros(timestamp) 描述

返回 从1970-01-01 00:00:00 UTC 到 timestamp 过了多少微秒。

实践
SELECT unix_micros(TIMESTAMP('1970-01-01 00:00:01Z'));
+---------------------------------+
|unix_micros(1970-01-01 00:00:01Z)|
+---------------------------------+
|                          1000000|
+---------------------------------+
unix_millis(timestamp) 描述
  • 返回 从1970-01-01 00:00:00 UTC 到 timestamp 过了多少毫秒。
  • 截取高级别的精度。
实践
SELECT unix_millis(TIMESTAMP('1970-01-01 00:00:01Z'));
+---------------------------------+
|unix_millis(1970-01-01 00:00:01Z)|
+---------------------------------+
|                             1000|
+---------------------------------+
unix_seconds(timestamp) 描述
  • 返回 从1970-01-01 00:00:00 UTC 到 timestamp 过了多少秒。
  • 截取高级别的精度。
实践
SELECT unix_seconds(TIMESTAMP('1970-01-01 00:00:01Z'));
+----------------------------------+
|unix_seconds(1970-01-01 00:00:01Z)|
+----------------------------------+
|                                 1|
+----------------------------------+
unix_timestamp([timeExp[, fmt]]) 描述

返回当前或者特定时间的 UNIX 时间戳。

实践
SELECT unix_timestamp();
+--------------------------------------------------------+
|unix_timestamp(current_timestamp(), yyyy-MM-dd HH:mm:ss)|
+--------------------------------------------------------+
|                                              1633529631|
+--------------------------------------------------------+

SELECT unix_timestamp('2016-04-08', 'yyyy-MM-dd');
+--------------------------------------+
|unix_timestamp(2016-04-08, yyyy-MM-dd)|
+--------------------------------------+
|                            1460073600|
+--------------------------------------+

weekday(date) 描述
  • 返回日期或者时间戳所在星期处于第几天。
  • 星期天返回 0,星期六返回 6。
实践
SELECT weekday('2009-07-30');
+-------------------+
|weekday(2009-07-30)|
+-------------------+
|                  3|
+-------------------+
weekofyear(date) 描述
  • 返回给定日期所在年份处于第几个星期。
  • 星期从星期一开始。
  • 第一周指的大于 3 天的第一周。
实践
SELECT weekofyear('2008-02-20');
+----------------------+
|weekofyear(2008-02-20)|
+----------------------+
|                     8|
+----------------------+
window(time_column, window_duration[, slide_duration[, start_time]]) 描述
  • Spark Structured Streaming 中使用。
  • 给定指定列的时间戳,将行压缩为一个或多个时间窗口。
  • 窗口开始是包含的,但窗口结束是独占的,例如12:05将在窗口[12:05,12:10]中,但不在[12:00,12:05]中。
  • 窗口可以支持微秒精度。
  • 不支持月数窗口。。
实践
SELECT a, window.start, window.end, count(*) as cnt FROM VALUES ('A1', '2021-01-01 00:00:00'), ('A1', '2021-01-01 00:04:30'), ('A1', '2021-01-01 00:06:00'), ('A2', '2021-01-01 00:01:00') AS tab(a, b) GROUP by a, window(b, '5 minutes') ORDER BY a, start;
+---+-------------------+-------------------+---+
|  a|              start|                end|cnt|
+---+-------------------+-------------------+---+
| A1|2021-01-01 00:00:00|2021-01-01 00:05:00|  2|
| A1|2021-01-01 00:05:00|2021-01-01 00:10:00|  1|
| A2|2021-01-01 00:00:00|2021-01-01 00:05:00|  1|
+---+-------------------+-------------------+---+

SELECt a, window.start, window.end, count(*) as cnt FROM VALUES ('A1', '2021-01-01 00:00:00'), ('A1', '2021-01-01 00:04:30'), ('A1', '2021-01-01 00:06:00'), ('A2', '2021-01-01 00:01:00') AS tab(a, b) GROUP by a, window(b, '10 minutes', '5 minutes') ORDER BY a, start;
+---+-------------------+-------------------+---+
|  a|              start|                end|cnt|
+---+-------------------+-------------------+---+
| A1|2020-12-31 23:55:00|2021-01-01 00:05:00|  2|
| A1|2021-01-01 00:00:00|2021-01-01 00:10:00|  3|
| A1|2021-01-01 00:05:00|2021-01-01 00:15:00|  1|
| A2|2020-12-31 23:55:00|2021-01-01 00:05:00|  1|
| A2|2021-01-01 00:00:00|2021-01-01 00:10:00|  1|
+---+-------------------+-------------------+---+
year(date)

返回日期或者时间戳的年份部分。

SELECT year('2016-07-30');
+----------------+
|year(2016-07-30)|
+----------------+
|            2016|
+----------------+

欢迎分享,转载请注明来源:内存溢出

原文地址: https://outofmemory.cn/zaji/5605422.html

(0)
打赏 微信扫一扫 微信扫一扫 支付宝扫一扫 支付宝扫一扫
上一篇 2022-12-15
下一篇 2022-12-15

发表评论

登录后才能评论

评论列表(0条)

保存