本文隶属于专栏《1000个问题搞定大数据技术体系》,该专栏为笔者原创,引用请注明来源,不足和错误之处请在评论区帮忙指出,谢谢!
正文 add_months(start_date, num_months) 描述本专栏目录结构和参考文献请见1000个问题搞定大数据技术体系
返回start_date 加上 num_months 月后的日期。
实践SELECt add_months('2016-08-31', 1); +-------------------------+ |add_months(2016-08-31, 1)| +-------------------------+ | 2016-09-30| +-------------------------+current_date() 描述
- 返回查询评估起始阶段的当前日期。
- 同一个查询所有的 current_date 返回的值都是相同的。
SELECT current_date(); +--------------+ |current_date()| +--------------+ | 2021-10-06| +--------------+current_date 描述
- 返回查询评估起始阶段的当前日期。
SELECT current_date; +--------------+ |current_date()| +--------------+ | 2021-10-06| +--------------+current_timestamp() 描述
- 返回查询评估起始阶段的当前时间戳。
- 同一个查询所有的 current_timestamp 返回的值都是相同的。
SELECT current_timestamp(); +--------------------+ | current_timestamp()| +--------------------+ |2021-10-06 14:13:...| +--------------------+current_timestamp 描述
- 返回查询评估起始阶段的当前时间戳。
SELECT current_timestamp; +--------------------+ | current_timestamp()| +--------------------+ |2021-10-06 14:13:...| +--------------------+current_timezone() 描述
返回当前 session 的本地时区。
实践SELECT current_timezone(); +------------------+ |current_timezone()| +------------------+ | Etc/UTC| +------------------+date_add(start_date, num_days) 描述
返回start_date 加上 num_days 天后的日期。
实践SELECT date_add('2016-07-30', 1); +-----------------------+ |date_add(2016-07-30, 1)| +-----------------------+ | 2016-07-31| +-----------------------+date_format(timestamp, fmt) 描述
基于特定的日期格式fmt转换时间戳 timestamp 为字符串。
实践SELECT date_format('2016-04-08', 'y'); +--------------------------+ |date_format(2016-04-08, y)| +--------------------------+ | 2016| +--------------------------+date_from_unix_date(days) 描述
- 创建一个日期
- 这个日期从 1970-01-01 开始过了 days 天。
SELECt date_from_unix_date(1); +----------------------+ |date_from_unix_date(1)| +----------------------+ | 1970-01-02| +----------------------+date_part(field, source) 描述
- 抽取 source 中的一部分。
- source 代表日期、时间戳或者时间间隔 。
SELECt date_part('YEAR', TIMESTAMP '2019-08-12 01:00:00.123456'); +-------------------------------------------------------+ |date_part(YEAR, TIMESTAMP '2019-08-12 01:00:00.123456')| +-------------------------------------------------------+ | 2019| +-------------------------------------------------------+ SELECT date_part('week', timestamp'2019-08-12 01:00:00.123456'); +-------------------------------------------------------+ |date_part(week, TIMESTAMP '2019-08-12 01:00:00.123456')| +-------------------------------------------------------+ | 33| +-------------------------------------------------------+ SELECT date_part('doy', DATE'2019-08-12'); +---------------------------------+ |date_part(doy, DATE '2019-08-12')| +---------------------------------+ | 224| +---------------------------------+ SELECT date_part('SECONDS', timestamp'2019-10-01 00:00:01.000001'); +----------------------------------------------------------+ |date_part(SECONDS, TIMESTAMP '2019-10-01 00:00:01.000001')| +----------------------------------------------------------+ | 1.000001| +----------------------------------------------------------+ SELECT date_part('days', interval 5 days 3 hours 7 minutes); +-------------------------------------------------+ |date_part(days, INTERVAL '5 03:07' DAY TO MINUTE)| +-------------------------------------------------+ | 5| +-------------------------------------------------+ SELECT date_part('seconds', interval 5 hours 30 seconds 1 milliseconds 1 microseconds); +-------------------------------------------------------------+ |date_part(seconds, INTERVAL '05:00:30.001001' HOUR TO SECOND)| +-------------------------------------------------------------+ | 30.001001| +-------------------------------------------------------------+ SELECT date_part('MONTH', INTERVAL '2021-11' YEAR TO MONTH); +--------------------------------------------------+ |date_part(MONTH, INTERVAL '2021-11' YEAR TO MONTH)| +--------------------------------------------------+ | 11| +--------------------------------------------------+ SELECT date_part('MINUTE', INTERVAL '123 23:55:59.002001' DAY TO SECOND); +---------------------------------------------------------------+ |date_part(MINUTE, INTERVAL '123 23:55:59.002001' DAY TO SECOND)| +---------------------------------------------------------------+ | 55| +---------------------------------------------------------------+
date_sub(start_date, num_days) 描述doy 代表一年中的第几天。
dow 代表一周中的第几天。
返回start_date 减去 num_days 天后的日期。
实践SELECT date_sub('2016-07-30', 1); +-----------------------+ |date_sub(2016-07-30, 1)| +-----------------------+ | 2016-07-29| +-----------------------+date_trunc(fmt, ts) 描述
- 针对时间 ts 执行截取 *** 作。
- 截取到指定的单位 fmt 。
SELECT date_trunc('YEAR', '2015-03-05T09:32:05.359'); +-----------------------------------------+ |date_trunc(YEAR, 2015-03-05T09:32:05.359)| +-----------------------------------------+ | 2015-01-01 00:00:00| +-----------------------------------------+ SELECT date_trunc('MM', '2015-03-05T09:32:05.359'); +---------------------------------------+ |date_trunc(MM, 2015-03-05T09:32:05.359)| +---------------------------------------+ | 2015-03-01 00:00:00| +---------------------------------------+ SELECT date_trunc('DD', '2015-03-05T09:32:05.359'); +---------------------------------------+ |date_trunc(DD, 2015-03-05T09:32:05.359)| +---------------------------------------+ | 2015-03-05 00:00:00| +---------------------------------------+ SELECT date_trunc('HOUR', '2015-03-05T09:32:05.359'); +-----------------------------------------+ |date_trunc(HOUR, 2015-03-05T09:32:05.359)| +-----------------------------------------+ | 2015-03-05 09:00:00| +-----------------------------------------+ SELECT date_trunc('MILLISECOND', '2015-03-05T09:32:05.123456'); +---------------------------------------------------+ |date_trunc(MILLISECOND, 2015-03-05T09:32:05.123456)| +---------------------------------------------------+ | 2015-03-05 09:32:...| +---------------------------------------------------+datediff(endDate, startDate) 描述
返回从 startDate 到 endDate 之间的天数。
实践SELECT datediff('2009-07-31', '2009-07-30'); +--------------------------------+ |datediff(2009-07-31, 2009-07-30)| +--------------------------------+ | 1| +--------------------------------+ SELECT datediff('2009-07-30', '2009-07-31'); +--------------------------------+ |datediff(2009-07-30, 2009-07-31)| +--------------------------------+ | -1| +--------------------------------+day(date) 描述
返回日期或者时间戳在所在月份是第几天。
实践SELECT day('2009-07-30'); +---------------+ |day(2009-07-30)| +---------------+ | 30| +---------------+dayofmonth(date) 描述
返回日期或者时间戳在所在月份是第几天。
实践SELECT dayofmonth('2009-07-30'); +----------------------+ |dayofmonth(2009-07-30)| +----------------------+ | 30| +----------------------+
dayofweek(date) 描述day(date)和 dayofmonth(date) 是一样的。
- 返回日期或者时间戳在所在星期是第几天。
- 0 代表星期天,6 代表星期六。
SELECT dayofweek('2009-07-30'); +---------------------+ |dayofweek(2009-07-30)| +---------------------+ | 5| +---------------------+dayofyear(date) 描述
- 返回日期或者时间戳在所在年份是第几天。
SELECT dayofyear('2016-04-09'); +---------------------+ |dayofyear(2016-04-09)| +---------------------+ | 100| +---------------------+extract(field FROM source) 描述
- 抽取 source 中的一部分。
- source 代表日期、时间戳或者时间间隔 。
SELECt extract(YEAR FROM TIMESTAMP '2019-08-12 01:00:00.123456'); +---------------------------------------------------------+ |extract(YEAR FROM TIMESTAMP '2019-08-12 01:00:00.123456')| +---------------------------------------------------------+ | 2019| +---------------------------------------------------------+ SELECt extract(week FROM timestamp'2019-08-12 01:00:00.123456'); +---------------------------------------------------------+ |extract(week FROM TIMESTAMP '2019-08-12 01:00:00.123456')| +---------------------------------------------------------+ | 33| +---------------------------------------------------------+ SELECt extract(doy FROM DATE'2019-08-12'); +-----------------------------------+ |extract(doy FROM DATE '2019-08-12')| +-----------------------------------+ | 224| +-----------------------------------+ SELECt extract(SEConDS FROM timestamp'2019-10-01 00:00:01.000001'); +------------------------------------------------------------+ |extract(SEConDS FROM TIMESTAMP '2019-10-01 00:00:01.000001')| +------------------------------------------------------------+ | 1.000001| +------------------------------------------------------------+ SELECt extract(days FROM interval 5 days 3 hours 7 minutes); +---------------------------------------------------+ |extract(days FROM INTERVAL '5 03:07' DAY TO MINUTE)| +---------------------------------------------------+ | 5| +---------------------------------------------------+ SELECt extract(seconds FROM interval 5 hours 30 seconds 1 milliseconds 1 microseconds); +---------------------------------------------------------------+ |extract(seconds FROM INTERVAL '05:00:30.001001' HOUR TO SECOND)| +---------------------------------------------------------------+ | 30.001001| +---------------------------------------------------------------+ SELECt extract(MONTH FROM INTERVAL '2021-11' YEAR TO MONTH); +----------------------------------------------------+ |extract(MONTH FROM INTERVAL '2021-11' YEAR TO MONTH)| +----------------------------------------------------+ | 11| +----------------------------------------------------+ SELECt extract(MINUTE FROM INTERVAL '123 23:55:59.002001' DAY TO SECOND); +-----------------------------------------------------------------+ |extract(MINUTE FROM INTERVAL '123 23:55:59.002001' DAY TO SECOND)| +-----------------------------------------------------------------+ | 55| +-----------------------------------------------------------------+
from_unixtime(unix_time[, fmt]) 描述date_part(field, source) 和 extract(field FROM source) 是一样的用法。
返回特定日期格式 fmt 的 unix_time 。
实践SELECt from_unixtime(0, 'yyyy-MM-dd HH:mm:ss'); +-------------------------------------+ |from_unixtime(0, yyyy-MM-dd HH:mm:ss)| +-------------------------------------+ | 1970-01-01 00:00:00| +-------------------------------------+ SELECt from_unixtime(0); +-------------------------------------+ |from_unixtime(0, yyyy-MM-dd HH:mm:ss)| +-------------------------------------+ | 1970-01-01 00:00:00| +-------------------------------------+from_utc_timestamp(timestamp, timezone) 描述
- 给定一个时间戳 timestamp,基于 UTC 方式和给定的时区 timezone 来解释成时间。
- 比如:from_utc_timestamp(‘2017-07-14 02:40:00.0’ ,‘GMT+1’ )将会返回 ‘2017-07-14 03:40:00’
SELECt from_utc_timestamp('2016-08-31', 'Asia/Seoul'); +------------------------------------------+ |from_utc_timestamp(2016-08-31, Asia/Seoul)| +------------------------------------------+ | 2016-08-31 09:00:00| +------------------------------------------+ SELECt from_utc_timestamp('2021-11-25', 'Asia/Shanghai'); +------------------------------------------+ |from_utc_timestamp(2021-11-25, Asia/Shanghai)| +------------------------------------------+ | 2021-11-25 08:00:00| +------------------------------------------+ SELECt from_utc_timestamp('2017-07-14 02:40:00.0' ,'GMT+1' ); +----------------------------------------------------+ |from_utc_timestamp('2017-07-14 02:40:00.0' ,'GMT+1')| +----------------------------------------------------+ | 2017-07-14 03:40:00| +----------------------------------------------------+hour(timestamp) 描述
返回字符串或者时间戳的小时部分。
实践SELECt hour('2009-07-30 12:58:59'); +-------------------------+ |hour(2009-07-30 12:58:59)| +-------------------------+ | 12| +-------------------------+ SELECT hour(current_timestamp()); +-------------------------+ |hour(current_timestamp())| +-------------------------+ | 22| +-------------------------+last_day(date) 描述
返回日期所在月份的最后一天。
实践SELECT last_day('2009-01-12'); +--------------------+ |last_day(2009-01-12)| +--------------------+ | 2009-01-31| +--------------------+make_date(year, month, day) 描述
根据年月日来创建日期。
实践SELECT make_date(2013, 7, 15); +----------------------+ |make_date(2013, 7, 15)| +----------------------+ | 2013-07-15| +----------------------+ SELECT make_date(2019, 13, 1); +----------------------+ |make_date(2019, 13, 1)| +----------------------+ | null| +----------------------+ SELECT make_date(2019, 7, NULL); +------------------------+ |make_date(2019, 7, NULL)| +------------------------+ | null| +------------------------+ SELECT make_date(2019, 2, 30); +----------------------+ |make_date(2019, 2, 30)| +----------------------+ | null| +----------------------+make_dt_interval([days[, hours[, mins[, secs]]]]) 描述
基于 天、小时、分、秒 来创建一个时间间隔。
实践SELECT make_dt_interval(1, 12, 30, 01.001001); +-------------------------------------+ |make_dt_interval(1, 12, 30, 1.001001)| +-------------------------------------+ | INTERVAL '1 12:30...'| +-------------------------------------+ SELECT make_dt_interval(2); +-----------------------------------+ |make_dt_interval(2, 0, 0, 0.000000)| +-----------------------------------+ | INTERVAL '2 00:00...'| +-----------------------------------+ SELECT make_dt_interval(100, null, 3); +----------------------------------------+ |make_dt_interval(100, NULL, 3, 0.000000)| +----------------------------------------+ | null| +----------------------------------------+make_interval([years[, months[, weeks[, days[, hours[, mins[, secs]]]]]]]) 描述
基于 年、月、星期、日、时、分、秒 来创建一个时间间隔。
实践SELECT make_interval(100, 11, 1, 1, 12, 30, 01.001001); +----------------------------------------------+ |make_interval(100, 11, 1, 1, 12, 30, 1.001001)| +----------------------------------------------+ | 100 years 11 mont...| +----------------------------------------------+ SELECT make_interval(100, null, 3); +----------------------------------------------+ |make_interval(100, NULL, 3, 0, 0, 0, 0.000000)| +----------------------------------------------+ | null| +----------------------------------------------+ SELECT make_interval(0, 1, 0, 1, 0, 0, 100.000001); +-------------------------------------------+ |make_interval(0, 1, 0, 1, 0, 0, 100.000001)| +-------------------------------------------+ | 1 months 1 days 1...| +-------------------------------------------+make_timestamp(year, month, day, hour, min, sec[, timezone]) 描述
- 基于 年、月、日、时、分、秒、时区来创建一个时间戳。
- 结果数据类型和配置 spark.sql.timestampType 定义的类型一致。
ELECT make_timestamp(2014, 12, 28, 6, 30, 45.887); +-------------------------------------------+ |make_timestamp(2014, 12, 28, 6, 30, 45.887)| +-------------------------------------------+ | 2014-12-28 06:30:...| +-------------------------------------------+ SELECT make_timestamp(2014, 12, 28, 6, 30, 45.887, 'CET'); +------------------------------------------------+ |make_timestamp(2014, 12, 28, 6, 30, 45.887, CET)| +------------------------------------------------+ | 2014-12-28 05:30:...| +------------------------------------------------+ SELECT make_timestamp(2019, 6, 30, 23, 59, 60); +---------------------------------------+ |make_timestamp(2019, 6, 30, 23, 59, 60)| +---------------------------------------+ | 2019-07-01 00:00:00| +---------------------------------------+ SELECT make_timestamp(2019, 6, 30, 23, 59, 1); +--------------------------------------+ |make_timestamp(2019, 6, 30, 23, 59, 1)| +--------------------------------------+ | 2019-06-30 23:59:01| +--------------------------------------+ SELECT make_timestamp(2019, 13, 1, 10, 11, 12, 'PST'); +--------------------------------------------+ |make_timestamp(2019, 13, 1, 10, 11, 12, PST)| +--------------------------------------------+ | null| +--------------------------------------------+ SELECT make_timestamp(null, 7, 22, 15, 30, 0); +--------------------------------------+ |make_timestamp(NULL, 7, 22, 15, 30, 0)| +--------------------------------------+ | null| +--------------------------------------+make_ym_interval([years[, months]]) 描述
创建一个基于 年、月 的时间间隔。
实践SELECT make_ym_interval(1, 2); +----------------------+ |make_ym_interval(1, 2)| +----------------------+ | INTERVAL '1-2' YE...| +----------------------+ SELECT make_ym_interval(1, 0); +----------------------+ |make_ym_interval(1, 0)| +----------------------+ | INTERVAL '1-0' YE...| +----------------------+ SELECT make_ym_interval(-1, 1); +-----------------------+ |make_ym_interval(-1, 1)| +-----------------------+ | INTERVAL '-0-11' ...| +-----------------------+ SELECT make_ym_interval(2); +----------------------+ |make_ym_interval(2, 0)| +----------------------+ | INTERVAL '2-0' YE...| +----------------------+minute(timestamp) 描述
返回字符串或者时间戳的分钟部分。
实践SELECT minute('2009-07-30 12:58:59'); +---------------------------+ |minute(2009-07-30 12:58:59)| +---------------------------+ | 58| +---------------------------+ SELECT minute(current_timestamp()); +---------------------------+ |minute(current_timestamp())| +---------------------------+ | 50| +---------------------------+month(date) 描述
返回字符串或者时间戳的月份部分。
实践SELECT month('2016-07-30'); +-----------------+ |month(2016-07-30)| +-----------------+ | 7| +-----------------+ SELECT month(current_timestamp()); +---------------------------+ | month(current_timestamp())| +---------------------------+ | 11| +---------------------------+months_between(timestamp1, timestamp2[, roundOff]) 描述
- 如果 timestamp1 晚于timestamp2,则结果为正。
- 如果 timestamp1 和timestamp2 都是处于所在月份的同一天或者都是所在月份的最后一天,那么这天时间会被忽略。
- 除非 roundOff=false , 否则的话,区别会基于每月 31 月,并且四舍五入到 8 位。
SELECT months_between('1997-02-28 10:30:00', '1996-10-30'); +-----------------------------------------------------+ |months_between(1997-02-28 10:30:00, 1996-10-30, true)| +-----------------------------------------------------+ | 3.94959677| +-----------------------------------------------------+ SELECT months_between('1997-02-28 10:30:00', '1996-10-30', false); +------------------------------------------------------+ |months_between(1997-02-28 10:30:00, 1996-10-30, false)| +------------------------------------------------------+ | 3.9495967741935485| +------------------------------------------------------+ SELECT months_between('1997-03-30', '1996-10-30'); +-----------------------------------------------------+ |months_between('1997-03-30', '1996-10-30' | +-----------------------------------------------------+ | 5.0| +-----------------------------------------------------+ SELECT months_between('1997-03-30 10:30:00', '1996-10-30'); +-----------------------------------------------------+ |months_between('1997-03-30 10:30:00', '1996-10-30') | +-----------------------------------------------------+ | 5.0| +-----------------------------------------------------+next_day(start_date, day_of_week) 描述
- 返回第一个日期,该日期晚于 start_date ,并按指示命名。
- 如果至少有一个输入参数为NULL,则函数返回NULL。
- 如果两个输入参数都不为NULL,并且 day_of_week 为无效输入
-
- 如果’spark.sql.ansi.enabled’设置为true,则函数将抛出IllegalArgumentException
-
- 否则为NULL。
SELECT next_day('2015-01-14', 'TU'); +------------------------+ |next_day(2015-01-14, TU)| +------------------------+ | 2015-01-20| +------------------------+ SELECT next_day('2015-01-14', NULL); +--------------------------+ |next_day(2015-01-14, NULL)| +--------------------------+ | NULL | +--------------------------+ SELECT next_day('2015-01-14', 1); +--------------------------+ |next_day(2015-01-14, NULL)| +--------------------------+ | NULL | +--------------------------+
now() 描述1、星期日:Sunday [ˈsʌndi]],英文缩写:SUN/SU。
2、星期一:Monday [ˈmʌndi],英文缩写:MON/MO。
3、星期二:Tuesday [ˈtju:zdi],英文缩写:TUE/TU。
4、星期三:Wednesday [ˈwenzdi],英文缩写:WED/WE。
5、星期四:Thursday [ˈθə:zdi]],英文缩写:THU/TH。
6、星期五:Friday [ˈfraidi],英文缩写:FRI/FR。
7、星期六:Saturday [ˈsætədi]],英文缩写:SAT/SA。
返回查询评估起始阶段的当前时间戳。
实践SELECT now(); +-----------------------+ | now()| +-----------------------+ |2021-11-25 23:15:23.389| +-----------------------+quarter(date) 描述
- 返回当前所在一年的四分之几。
- 返回值 1,2,3,4。
SELECT quarter('2016-08-31'); +-------------------+ |quarter(2016-08-31)| +-------------------+ | 3| +-------------------+ SELECT quarter(now()); +-------------------+ | quarter(now())| +-------------------+ | 4| +-------------------+second(timestamp) 描述
返回字符串或者时间戳的秒部分。
实践SELECT second('2009-07-30 12:58:59'); +---------------------------+ |second(2009-07-30 12:58:59)| +---------------------------+ | 59| +---------------------------+ SELECT second(now()); +---------------------------+ |second(now())| +---------------------------+ | 2| +---------------------------+session_window(time_column, gap_duration) 描述
- Spark Structured Streaming 中使用。
- 通过指定时间字段和间隔周期生成会话窗口。
SELECT a, session_window.start, session_window.end, count(*) as cnt FROM VALUES ('A1', '2021-01-01 00:00:00'), ('A1', '2021-01-01 00:04:30'), ('A1', '2021-01-01 00:10:00'), ('A2', '2021-01-01 00:01:00') AS tab(a, b) GROUP by a, session_window(b, '5 minutes') ORDER BY a, start; +---+-------------------+-------------------+---+ | a| start| end|cnt| +---+-------------------+-------------------+---+ | A1|2021-01-01 00:00:00|2021-01-01 00:09:30| 2| | A1|2021-01-01 00:10:00|2021-01-01 00:15:00| 1| | A2|2021-01-01 00:01:00|2021-01-01 00:06:00| 1| +---+-------------------+-------------------+---+ SELECt a, session_window.start, session_window.end, count(*) as cnt FROM VALUES ('A1', '2021-01-01 00:00:00'), ('A1', '2021-01-01 00:04:30'), ('A1', '2021-01-01 00:10:00'), ('A2', '2021-01-01 00:01:00'), ('A2', '2021-01-01 00:04:30') AS tab(a, b) GROUP by a, session_window(b, CASE WHEN a = 'A1' THEN '5 minutes' WHEN a = 'A2' THEN '1 minute' ELSE '10 minutes' END) ORDER BY a, start; +---+-------------------+-------------------+---+ | a| start| end|cnt| +---+-------------------+-------------------+---+ | A1|2021-01-01 00:00:00|2021-01-01 00:09:30| 2| | A1|2021-01-01 00:10:00|2021-01-01 00:15:00| 1| | A2|2021-01-01 00:01:00|2021-01-01 00:02:00| 1| | A2|2021-01-01 00:04:30|2021-01-01 00:05:30| 1| +---+-------------------+-------------------+---+timestamp_micros(microseconds) 描述
从 UTC 历元后的微秒数创建时间戳。
实践SELECt timestamp_micros(1230219000123123); +----------------------------------+ |timestamp_micros(1230219000123123)| +----------------------------------+ | 2008-12-25 15:30:...| +----------------------------------+timestamp_millis(milliseconds) 描述
从 UTC 历元后的毫秒数创建时间戳。
实践SELECT timestamp_millis(1230219000123); +-------------------------------+ |timestamp_millis(1230219000123)| +-------------------------------+ | 2008-12-25 15:30:...| +-------------------------------+timestamp_seconds(seconds) 描述
- 从 UTC 历元后的秒数创建时间戳。
- 这个数值可以很小。
SELECT timestamp_seconds(1230219000); +-----------------------------+ |timestamp_seconds(1230219000)| +-----------------------------+ | 2008-12-25 15:30:00| +-----------------------------+ SELECT timestamp_seconds(1230219000.123); +---------------------------------+ |timestamp_seconds(1230219000.123)| +---------------------------------+ | 2008-12-25 15:30:...| +---------------------------------+ SELECT timestamp_seconds(0.00123); +---------------------------------+ |timestamp_seconds(1230219000.123)| +---------------------------------+ | 1970-01-01 08:00:00.00123| +---------------------------------+to_date(date_str[, fmt]) 描述
- 将日期字符串 date_str 解析成指定的日期字符串格式fmt。
- 无效输入返回 NULL。
- 默认情况下,如果 fmt 没有设置,则会强制只保留日期。
SELECT to_date('2009-07-30 04:17:52'); +----------------------------+ |to_date(2009-07-30 04:17:52)| +----------------------------+ | 2009-07-30| +----------------------------+ SELECT to_date('2016-12-31', 'yyyy-MM-dd'); +-------------------------------+ |to_date(2016-12-31, yyyy-MM-dd)| +-------------------------------+ | 2016-12-31| +-------------------------------+to_timestamp(timestamp_str[, fmt]) 描述
- 将时间戳字符串 timestamp_str 解析成指定的日期字符串格式fmt。
- 无效输入返回 NULL。
- 默认情况下,如果 fmt 没有设置,则会强制只保留日期。
- 结果数据类型和配置spark.sql.timestampType 保持一致。
SELECT to_timestamp('2016-12-31 00:12:00'); +---------------------------------+ |to_timestamp(2016-12-31 00:12:00)| +---------------------------------+ | 2016-12-31 00:12:00| +---------------------------------+ SELECT to_timestamp('2016-12-31', 'yyyy-MM-dd'); +------------------------------------+ |to_timestamp(2016-12-31, yyyy-MM-dd)| +------------------------------------+ | 2016-12-31 00:00:00| +------------------------------------+to_unix_timestamp(timeExp[, fmt]) 描述
返回给定时间的 UNIX 时间戳。
实践SELECT to_unix_timestamp('2016-04-08', 'yyyy-MM-dd'); +-----------------------------------------+ |to_unix_timestamp(2016-04-08, yyyy-MM-dd)| +-----------------------------------------+ | 1460073600| +-----------------------------------------+to_utc_timestamp(timestamp, timezone) 描述
- 给定一个时间戳 timestamp,基于 UTC 方式和给定的时区 timezone 来解释成时间。
- 比如:to_utc_timestamp(‘2017-07-14 02:40:00.0’ ,‘GMT+1’ )将会返回 ‘2017-07-14 01:40:00’
SELECT to_utc_timestamp('2016-08-31', 'Asia/Seoul'); +------------------------------------------+ | to_utc_timestamp(2016-08-31, Asia/Seoul)| +------------------------------------------+ | 2016-08-30 15:00:00| +------------------------------------------+ SELECT to_utc_timestamp('2021-11-25', 'Asia/Shanghai'); +-------------------------------------------+ |to_utc_timestamp(2021-11-25, Asia/Shanghai)| +-------------------------------------------+ | 2021-11-24 16:00:00| +-------------------------------------------+ SELECT to_utc_timestamp('2017-07-14 02:40:00.0' ,'GMT+1' ); +----------------------------------------------------+ | to_utc_timestamp('2017-07-14 02:40:00.0' ,'GMT+1')| +----------------------------------------------------+ | 2017-07-14 01:40:00| +----------------------------------------------------+
trunc(date, fmt) 描述建议和上面的 from_utc_timestamp(timestamp, timezone) 结合起来理解。
date 截取到指定的时间单位 fmt 的第一天。
实践SELECt trunc('2019-08-04', 'week'); +-----------------------+ |trunc(2019-08-04, week)| +-----------------------+ | 2019-07-29| +-----------------------+ SELECT trunc('2019-08-04', 'quarter'); +--------------------------+ |trunc(2019-08-04, quarter)| +--------------------------+ | 2019-07-01| +--------------------------+ SELECT trunc('2009-02-12', 'MM'); +---------------------+ |trunc(2009-02-12, MM)| +---------------------+ | 2009-02-01| +---------------------+ SELECT trunc('2015-10-27', 'YEAR'); +-----------------------+ |trunc(2015-10-27, YEAR)| +-----------------------+ | 2015-01-01| +-----------------------+unix_date(date) 描述
返回从 1970-01-01 到 date 过了多少天。
实践SELECT unix_date(DATE("1970-01-02")); +---------------------+ |unix_date(1970-01-02)| +---------------------+ | 1| +---------------------+unix_micros(timestamp) 描述
返回 从1970-01-01 00:00:00 UTC 到 timestamp 过了多少微秒。
实践SELECT unix_micros(TIMESTAMP('1970-01-01 00:00:01Z')); +---------------------------------+ |unix_micros(1970-01-01 00:00:01Z)| +---------------------------------+ | 1000000| +---------------------------------+unix_millis(timestamp) 描述
- 返回 从1970-01-01 00:00:00 UTC 到 timestamp 过了多少毫秒。
- 截取高级别的精度。
SELECT unix_millis(TIMESTAMP('1970-01-01 00:00:01Z')); +---------------------------------+ |unix_millis(1970-01-01 00:00:01Z)| +---------------------------------+ | 1000| +---------------------------------+unix_seconds(timestamp) 描述
- 返回 从1970-01-01 00:00:00 UTC 到 timestamp 过了多少秒。
- 截取高级别的精度。
SELECT unix_seconds(TIMESTAMP('1970-01-01 00:00:01Z')); +----------------------------------+ |unix_seconds(1970-01-01 00:00:01Z)| +----------------------------------+ | 1| +----------------------------------+unix_timestamp([timeExp[, fmt]]) 描述
返回当前或者特定时间的 UNIX 时间戳。
实践SELECT unix_timestamp(); +--------------------------------------------------------+ |unix_timestamp(current_timestamp(), yyyy-MM-dd HH:mm:ss)| +--------------------------------------------------------+ | 1633529631| +--------------------------------------------------------+ SELECT unix_timestamp('2016-04-08', 'yyyy-MM-dd'); +--------------------------------------+ |unix_timestamp(2016-04-08, yyyy-MM-dd)| +--------------------------------------+ | 1460073600| +--------------------------------------+weekday(date) 描述
- 返回日期或者时间戳所在星期处于第几天。
- 星期天返回 0,星期六返回 6。
SELECT weekday('2009-07-30'); +-------------------+ |weekday(2009-07-30)| +-------------------+ | 3| +-------------------+weekofyear(date) 描述
- 返回给定日期所在年份处于第几个星期。
- 星期从星期一开始。
- 第一周指的大于 3 天的第一周。
SELECT weekofyear('2008-02-20'); +----------------------+ |weekofyear(2008-02-20)| +----------------------+ | 8| +----------------------+window(time_column, window_duration[, slide_duration[, start_time]]) 描述
- Spark Structured Streaming 中使用。
- 给定指定列的时间戳,将行压缩为一个或多个时间窗口。
- 窗口开始是包含的,但窗口结束是独占的,例如12:05将在窗口[12:05,12:10]中,但不在[12:00,12:05]中。
- 窗口可以支持微秒精度。
- 不支持月数窗口。。
SELECT a, window.start, window.end, count(*) as cnt FROM VALUES ('A1', '2021-01-01 00:00:00'), ('A1', '2021-01-01 00:04:30'), ('A1', '2021-01-01 00:06:00'), ('A2', '2021-01-01 00:01:00') AS tab(a, b) GROUP by a, window(b, '5 minutes') ORDER BY a, start; +---+-------------------+-------------------+---+ | a| start| end|cnt| +---+-------------------+-------------------+---+ | A1|2021-01-01 00:00:00|2021-01-01 00:05:00| 2| | A1|2021-01-01 00:05:00|2021-01-01 00:10:00| 1| | A2|2021-01-01 00:00:00|2021-01-01 00:05:00| 1| +---+-------------------+-------------------+---+ SELECt a, window.start, window.end, count(*) as cnt FROM VALUES ('A1', '2021-01-01 00:00:00'), ('A1', '2021-01-01 00:04:30'), ('A1', '2021-01-01 00:06:00'), ('A2', '2021-01-01 00:01:00') AS tab(a, b) GROUP by a, window(b, '10 minutes', '5 minutes') ORDER BY a, start; +---+-------------------+-------------------+---+ | a| start| end|cnt| +---+-------------------+-------------------+---+ | A1|2020-12-31 23:55:00|2021-01-01 00:05:00| 2| | A1|2021-01-01 00:00:00|2021-01-01 00:10:00| 3| | A1|2021-01-01 00:05:00|2021-01-01 00:15:00| 1| | A2|2020-12-31 23:55:00|2021-01-01 00:05:00| 1| | A2|2021-01-01 00:00:00|2021-01-01 00:10:00| 1| +---+-------------------+-------------------+---+year(date)
返回日期或者时间戳的年份部分。
SELECT year('2016-07-30'); +----------------+ |year(2016-07-30)| +----------------+ | 2016| +----------------+
欢迎分享,转载请注明来源:内存溢出
评论列表(0条)