本文隶属于专栏《1000个问题搞定大数据技术体系》,该专栏为笔者原创,引用请注明来源,不足和错误之处请在评论区帮忙指出,谢谢!
正文 from_json(jsonStr, schema[, options]) 描述本专栏目录结构和参考文献请见1000个问题搞定大数据技术体系
返回基于给定 jsonStr 和 schema 的结构体类型。
实践SELECt from_json('{"a":1, "b":0.8}', 'a INT, b DOUBLE'); +---------------------------+ |from_json({"a":1, "b":0.8})| +---------------------------+ | {1, 0.8}| +---------------------------+ SELECt from_json('{"time":"26/08/2015"}', 'time Timestamp', map('timestampFormat', 'dd/MM/yyyy')); +--------------------------------+ |from_json({"time":"26/08/2015"})| +--------------------------------+ | {"time":2015-08-26 00:00:00}| +--------------------------------+options 支持列表
从 json 文本 json_txt中基于 路径 path 抽取出一个 JSON 对象。
实践SELECT get_json_object('{"a":"b"}', '$.a'); +-------------------------------+ |get_json_object({"a":"b"}, $.a)| +-------------------------------+ | b| +-------------------------------+path 语法
使用$表示json变量标识,用 . 或 [] 读取对象或数组
get 单层值spark-sql> SELECT get_json_object('{ "store": { "fruit":[{"weight":8,"type":"apple"}, {"weight":9,"type":"pear"}], "bicycle":{"price":19.95,"color":"red"} }, "email":"amy@only_for_json_udf_test.net", "owner":"amy" }', '$.owner'); amyget 多层值
spark-sql> SELECT get_json_object('{ "store": { "fruit":[{"weight":8,"type":"apple"}, {"weight":9,"type":"pear"}], "bicycle":{"price":19.95,"color":"red"} }, "email":"amy@only_for_json_udf_test.net", "owner":"amy" }', '$.store.bicycle.price'); 19.95get 数组值
spark-sql> SELECT get_json_object('{ "store": { "fruit":[{"weight":8,"type":"apple"}, {"weight":9,"type":"pear"}], "bicycle":{"price":19.95,"color":"red"} }, "email":"amy@only_for_json_udf_test.net", "owner":"amy" }', '$.store.fruit[0]'); {"weight":8,"type":"apple"}json_array_length(jsonArray) 描述
返回最外层的 JSON 数组jsonArray中元素的数量。
实践SELECT json_array_length('[1,2,3,4]'); +----------------------------+ |json_array_length([1,2,3,4])| +----------------------------+ | 4| +----------------------------+ SELECT json_array_length('[1,2,3,{"f1":1,"f2":[5,6]},4]'); +------------------------------------------------+ |json_array_length([1,2,3,{"f1":1,"f2":[5,6]},4])| +------------------------------------------------+ | 5| +------------------------------------------------+ SELECT json_array_length('[1,2'); +-----------------------+ |json_array_length([1,2)| +-----------------------+ | null| +-----------------------+json_object_keys(json_object) 描述
返回最外层 JSON 对象 json_object 的所有 keys 组成的数组。
实践ELECT json_object_keys('{}'); +--------------------+ |json_object_keys({})| +--------------------+ | []| +--------------------+ SELECT json_object_keys('{"key": "value"}'); +----------------------------------+ |json_object_keys({"key": "value"})| +----------------------------------+ | [key]| +----------------------------------+ SELECT json_object_keys('{"f1":"abc","f2":{"f3":"a", "f4":"b"}}'); +--------------------------------------------------------+ |json_object_keys({"f1":"abc","f2":{"f3":"a", "f4":"b"}})| +--------------------------------------------------------+ | [f1, f2]| +--------------------------------------------------------+json_tuple(jsonStr, p1, p2, …, pn) 描述
- 返回一个类似函数 get_json_object 的元组。
- 这个元组可以有多个名称。
- 所有的输入参数和输出字段类型都是 String 类型。
SELECT json_tuple('{"a":1, "b":2}', 'a', 'b'); +---+---+ | c0| c1| +---+---+ | 1| 2| +---+---+schema_of_json(json[, options]) 描述
以 DDL 的形式返回 JSON 字符串 json 的 schema 形式。
实践SELECT schema_of_json('[{"col":0}]'); +----------------------------+ |schema_of_json([{"col":0}])| +----------------------------+ |ARRAYto_json(expr[, options]) 描述>| +----------------------------+ SELECT schema_of_json('[{"col":01}]', map('allowNumericLeadingZeros', 'true')); +----------------------------+ |schema_of_json([{"col":01}])| +----------------------------+ |ARRAY >| +----------------------------+
通过给定的结构体返回一个 JSON 字符串。
实践SELECT to_json(named_struct('a', 1, 'b', 2)); +---------------------------------+ |to_json(named_struct(a, 1, b, 2))| +---------------------------------+ | {"a":1,"b":2}| +---------------------------------+ SELECT to_json(named_struct('time', to_timestamp('2015-08-26', 'yyyy-MM-dd')), map('timestampFormat', 'dd/MM/yyyy')); +-----------------------------------------------------------------+ |to_json(named_struct(time, to_timestamp(2015-08-26, yyyy-MM-dd)))| +-----------------------------------------------------------------+ | {"time":"26/08/2015"}| +-----------------------------------------------------------------+ SELECT to_json(array(named_struct('a', 1, 'b', 2))); +----------------------------------------+ |to_json(array(named_struct(a, 1, b, 2)))| +----------------------------------------+ | [{"a":1,"b":2}]| +----------------------------------------+ SELECT to_json(map('a', named_struct('b', 1))); +-----------------------------------+ |to_json(map(a, named_struct(b, 1)))| +-----------------------------------+ | {"a":{"b":1}}| +-----------------------------------+ SELECT to_json(map(named_struct('a', 1),named_struct('b', 2))); +----------------------------------------------------+ |to_json(map(named_struct(a, 1), named_struct(b, 2)))| +----------------------------------------------------+ | {"[1]":{"b":2}}| +----------------------------------------------------+ SELECT to_json(map('a', 1)); +------------------+ |to_json(map(a, 1))| +------------------+ | {"a":1}| +------------------+ SELECT to_json(array((map('a', 1)))); +-------------------------+ |to_json(array(map(a, 1)))| +-------------------------+ | [{"a":1}]| +-------------------------+
欢迎分享,转载请注明来源:内存溢出
评论列表(0条)