背景:
在业务中,对于json数据一般通过get_json_object或者json_tuple来解析json数据。对于有json和常规数据混合这种,用这两个函数比较常见。但是对于全是文本的这种json数据,我们可以通过创建json表与json表进行映射去查询或者统计数据。
{"student":{"name":"xiaowang","age":11,"sex":"M"},"class":{"book":"语文","level":2,"score":81},"teacher":{"name":"t1","class":"语文"}} {"student":{"name":"xiaoming","age":12,"sex":"M"},"class":{"book":"语文","level":2,"score":82},"teacher":{"name":"t2","class":"语文"}} {"student":{"name":"xiaolan","age":13,"sex":"M"},"class":{"book":"语文","level":2,"score":83},"teacher":{"name":"t3","class":"语文"}} {"student":{"name":"xiaohei","age":14,"sex":"M"},"class":{"book":"语文","level":2,"score":84},"teacher":{"name":"t1","class":"语文"}} {"student":{"name":"xiaobai","age":15,"sex":"M"},"class":{"book":"语文","level":2,"score":86},"teacher":{"name":"t2","class":"语文"}} {"student":{"name":"xiaohong","age":16,"sex":"M"},"class":{"book":"语文","level":2,"score":87},"teacher":{"name":"t3","class":"语文"}}2.建表语句如下:
我们需要的映射类:org.apache.hive.hcatalog.data.JsonSerDe
create external table test.test_student_json( student map3.如何查询数据:comment "学生信息", class map comment "课程信息", teacher map comment "授课老师信息") row format serde 'org.apache.hive.hcatalog.data.JsonSerDe';
查询外层json数据:
spark-sql> select student from test.test_student_json; {"age":"11","name":"xiaowang","sex":"M"} {"age":"12","name":"xiaoming","sex":"M"} {"age":"13","name":"xiaolan","sex":"M"} {"age":"14","name":"xiaohei","sex":"M"} {"age":"15","name":"xiaobai","sex":"M"} {"age":"16","name":"xiaohong","sex":"M"} Time taken: 2.424 seconds, Fetched 6 row(s)
查询里层json数据:
spark-sql> select student['name'] from test.test_student_json; xiaowang xiaoming xiaolan xiaohei xiaobai xiaohong Time taken: 0.173 seconds, Fetched 6 row(s)
对于全是文本的这种json数据,使用json表映射的方式还是比较方便的
欢迎分享,转载请注明来源:内存溢出
评论列表(0条)