hive表新增字段后，新字段无法写入的问题 -- cascade_IT百科

实际应用中，常常存在修改数据表结构的需求，比如：增加一个新字段。

如果使用如下语句新增列，可以成功添加列col1。但如果数据表tb已经有旧的分区（例如：dt=20190101），则该旧分区中的col1将为空且无法更新，即便insert overwrite该分区也不会生效。

解决方法：

解决方法很简单，就是增加col1时加上cascade关键字。示例如下：

加深记忆的方法也很简单，cascade的中文翻译为“级联”，也就是不仅变更新分区的表结构（metadata），同时也变更旧分区的表结构。

ADD COLUMNS lets you add new columns to the end of the existing columns but before the partition columns. This is supported for Avro backed tables as well, for Hive 0.14 and later.

REPLACE COLUMNS removes all existing columns and adds the new set of columns. This can be done only for tables with a native SerDe (DynamicSerDe, MetadataTypedColumnsetSerDe, LazySimpleSerDe and ColumnarSerDe). Refer to Hive SerDe for more information. REPLACE COLUMNS can also be used to drop columns. For example, "ALTER TABLE test_change REPLACE COLUMNS (a int, b int)" will remove column 'c' from test_change's schema.

The PARTITION clause is available in Hive 0.14.0 and latersee Upgrading Pre-Hive 0.13.0 Decimal Columns for usage.

The CASCADE|RESTRICT clause is available in Hive 1.1.0. ALTER TABLE ADD|REPLACE COLUMNS with CASCADE command changes the columns of a table's metadata, and cascades the same change to all the partition metadata. RESTRICT is the default, limiting column changes only to table metadata.

方法一：利用编辑器直接插入控制字符，以Vi为例。进入Vi：Shell代码收藏代码$visupply-20110101.txt在Vi命令模式下，键入:setlist，设置控制字符可见，成功后Vi会立即显示一个行结束标志$。填入Hive表中需要的每列数据，比如我这里需要创建一个分区表：Hiveshell代码收藏代码hive(ch09)>createtablesupply(idint,partstring,quantityint)partitionedby(dayint)hive(ch09)>altertablesupplyaddpartition(day=20110101)hive(ch09)>altertablesupplyaddpartition(day=20110102)hive(ch09)>altertablesupplyaddpartition(day=20110103)可以看到一共需要三列数据，分别是id,part,quantity。在Vi中进入编辑模式，并填入：Vi代码收藏代码10part10100$我在这里是想输入10作为ID,part10作为part,100作为quantity,最后的$是行结束标志。然后移动光标到需要插入分隔符的地方，首先键入Ctrl+V,再键入字段分隔符Ctrl+A:Vi代码收藏代码10^Apart10100$依次插入其他分隔符，并完成编辑：Vi代码收藏代码10^Apart10^A100$11^Apart11^A90$12^Apart12^A110$13^Apart13^A80$这时候可以导入数据到HiveTable了：Hiveshell代码收藏代码hive(ch09)>loaddatalocalinpath'${env:HOME}/data/supply-20110103.txt'overwriteintotablesupply>partition(day='20110103')Copyingdatafromfile:/root/data/supply-20110103.txtCopyingfile:file:/root/data/supply-20110103.txtLoadingdatatotablech09.supplypartition(day=20110103)rmr:DEPRECATED:Pleaseuse'rm-r'instead.Moved:'hdfs://n8.example.com:8020/user/hive/warehouse/ch09.db/supply/day=20110103'totrashat:hdfs://n8.example.com:8020/user/root/.Trash/CurrentPartitionch09.supply{day=20110103}stats:[num_files:1,num_rows:0,total_size:54,raw_data_size:0]Tablech09.supplystats:[num_partitions:3,num_files:3,num_rows:0,total_size:147,raw_data_size:0]OKTimetaken:0.522seconds查看一下刚才load的数据，确保正确：Hiveshell代码收藏代码hive(ch09)>select*fromsupplywhereday='20110103'OKidpartquantityday10part101002011010311part11902011010312part121102011010313part138020110103Timetaken:0.229seconds可以看到数据完全正确，这里还可以看到，Hive自动把select*这样的 *** 作转换成文件系统 *** 作，没有生成任何MapReduceJob。方法二：自定义HiveTable的分隔符。Hiveshell代码收藏代码CREATETABLEsupply(idINT,partSTRING,quantityINT)PARTITIONEDBY(dayINT)ROWFORMATDELIMITEDFIELDSTERMINATEDBY'.'COLLECTIONITEMSTERMINATEDBY','MAPKEYSTERMINATEDBY'='STOREDASSEQUENCEFILE这样就可以避开控制字符。出自：/blog/1922887

欢迎分享，转载请注明来源：内存溢出

原文地址: http://outofmemory.cn/bake/11852255.html

hive表新增字段后，新字段无法写入的问题 -- cascade

发表评论

评论列表（0条）