通过spark程序向parquet格式的表写数据报错ParquetEncodingException: empty fields are illegal, the field should be ommited completely instead
报错如下:
org.apache.spark.SparkException: Task failed while writing rows.
at org.apache.spark.sql.hive.SparkHiveDynamicPartitionWriterContainer.writeToFile(hiveWriterContainers.scala:333)
at org.apache.spark.sql.hive.execution.InsertIntoHiveTableKaTeX parse error: Can't use function '$' in math mode at position 8: anonfun$̲saveAsHiveFile$…anonfun$saveAsHiveFile
3.
a
p
p
l
y
(
I
n
s
e
r
t
I
n
t
o
H
i
v
e
T
a
b
l
e
.
s
c
a
l
a
:
210
)
a
t
o
r
g
.
a
p
a
c
h
e
.
s
p
a
r
k
.
s
c
h
e
d
u
l
e
r
.
R
e
s
u
l
t
T
a
s
k
.
r
u
n
T
a
s
k
(
R
e
s
u
l
t
T
a
s
k
.
s
c
a
l
a
:
87
)
a
t
o
r
g
.
a
p
a
c
h
e
.
s
p
a
r
k
.
s
c
h
e
d
u
l
e
r
.
T
a
s
k
.
r
u
n
(
T
a
s
k
.
s
c
a
l
a
:
99
)
a
t
o
r
g
.
a
p
a
c
h
e
.
s
p
a
r
k
.
e
x
e
c
u
t
o
r
.
E
x
e
c
u
t
o
r
3.apply(InsertIntoHiveTable.scala:210) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) at org.apache.spark.scheduler.Task.run(Task.scala:99) at org.apache.spark.executor.Executor
3.apply(InsertIntoHiveTable.scala:210)atorg.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)atorg.apache.spark.scheduler.Task.run(Task.scala:99)atorg.apache.spark.executor.ExecutorTaskRunner.run(Executor.scala:322)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor
W
o
r
k
e
r
.
r
u
n
(
T
h
r
e
a
d
P
o
o
l
E
x
e
c
u
t
o
r
.
j
a
v
a
:
617
)
a
t
j
a
v
a
.
l
a
n
g
.
T
h
r
e
a
d
.
r
u
n
(
T
h
r
e
a
d
.
j
a
v
a
:
745
)
C
a
u
s
e
d
b
y
:
j
a
v
a
.
l
a
n
g
.
R
u
n
t
i
m
e
E
x
c
e
p
t
i
o
n
:
P
a
r
q
u
e
t
r
e
c
o
r
d
i
s
m
a
l
f
o
r
m
e
d
:
e
m
p
t
y
f
i
e
l
d
s
a
r
e
i
l
l
e
g
a
l
,
t
h
e
f
i
e
l
d
s
h
o
u
l
d
b
e
o
m
m
i
t
e
d
c
o
m
p
l
e
t
e
l
y
i
n
s
t
e
a
d
a
t
o
r
g
.
a
p
a
c
h
e
.
h
a
d
o
o
p
.
h
i
v
e
.
q
l
.
i
o
.
p
a
r
q
u
e
t
.
w
r
i
t
e
.
D
a
t
a
W
r
i
t
a
b
l
e
W
r
i
t
e
r
.
w
r
i
t
e
(
D
a
t
a
W
r
i
t
a
b
l
e
W
r
i
t
e
r
.
j
a
v
a
:
64
)
a
t
o
r
g
.
a
p
a
c
h
e
.
h
a
d
o
o
p
.
h
i
v
e
.
q
l
.
i
o
.
p
a
r
q
u
e
t
.
w
r
i
t
e
.
D
a
t
a
W
r
i
t
a
b
l
e
W
r
i
t
e
S
u
p
p
o
r
t
.
w
r
i
t
e
(
D
a
t
a
W
r
i
t
a
b
l
e
W
r
i
t
e
S
u
p
p
o
r
t
.
j
a
v
a
:
59
)
a
t
o
r
g
.
a
p
a
c
h
e
.
h
a
d
o
o
p
.
h
i
v
e
.
q
l
.
i
o
.
p
a
r
q
u
e
t
.
w
r
i
t
e
.
D
a
t
a
W
r
i
t
a
b
l
e
W
r
i
t
e
S
u
p
p
o
r
t
.
w
r
i
t
e
(
D
a
t
a
W
r
i
t
a
b
l
e
W
r
i
t
e
S
u
p
p
o
r
t
.
j
a
v
a
:
31
)
a
t
p
a
r
q
u
e
t
.
h
a
d
o
o
p
.
I
n
t
e
r
n
a
l
P
a
r
q
u
e
t
R
e
c
o
r
d
W
r
i
t
e
r
.
w
r
i
t
e
(
I
n
t
e
r
n
a
l
P
a
r
q
u
e
t
R
e
c
o
r
d
W
r
i
t
e
r
.
j
a
v
a
:
121
)
a
t
p
a
r
q
u
e
t
.
h
a
d
o
o
p
.
P
a
r
q
u
e
t
R
e
c
o
r
d
W
r
i
t
e
r
.
w
r
i
t
e
(
P
a
r
q
u
e
t
R
e
c
o
r
d
W
r
i
t
e
r
.
j
a
v
a
:
123
)
a
t
p
a
r
q
u
e
t
.
h
a
d
o
o
p
.
P
a
r
q
u
e
t
R
e
c
o
r
d
W
r
i
t
e
r
.
w
r
i
t
e
(
P
a
r
q
u
e
t
R
e
c
o
r
d
W
r
i
t
e
r
.
j
a
v
a
:
42
)
a
t
o
r
g
.
a
p
a
c
h
e
.
h
a
d
o
o
p
.
h
i
v
e
.
q
l
.
i
o
.
p
a
r
q
u
e
t
.
w
r
i
t
e
.
P
a
r
q
u
e
t
R
e
c
o
r
d
W
r
i
t
e
r
W
r
a
p
p
e
r
.
w
r
i
t
e
(
P
a
r
q
u
e
t
R
e
c
o
r
d
W
r
i
t
e
r
W
r
a
p
p
e
r
.
j
a
v
a
:
111
)
a
t
o
r
g
.
a
p
a
c
h
e
.
h
a
d
o
o
p
.
h
i
v
e
.
q
l
.
i
o
.
p
a
r
q
u
e
t
.
w
r
i
t
e
.
P
a
r
q
u
e
t
R
e
c
o
r
d
W
r
i
t
e
r
W
r
a
p
p
e
r
.
w
r
i
t
e
(
P
a
r
q
u
e
t
R
e
c
o
r
d
W
r
i
t
e
r
W
r
a
p
p
e
r
.
j
a
v
a
:
124
)
a
t
o
r
g
.
a
p
a
c
h
e
.
s
p
a
r
k
.
s
q
l
.
h
i
v
e
.
S
p
a
r
k
H
i
v
e
D
y
n
a
m
i
c
P
a
r
t
i
t
i
o
n
W
r
i
t
e
r
C
o
n
t
a
i
n
e
r
.
w
r
i
t
e
T
o
F
i
l
e
(
h
i
v
e
W
r
i
t
e
r
C
o
n
t
a
i
n
e
r
s
.
s
c
a
l
a
:
321
)
.
.
.
8
m
o
r
e
C
a
u
s
e
d
b
y
:
p
a
r
q
u
e
t
.
i
o
.
P
a
r
q
u
e
t
E
n
c
o
d
i
n
g
E
x
c
e
p
t
i
o
n
:
e
m
p
t
y
f
i
e
l
d
s
a
r
e
i
l
l
e
g
a
l
,
t
h
e
f
i
e
l
d
s
h
o
u
l
d
b
e
o
m
m
i
t
e
d
c
o
m
p
l
e
t
e
l
y
i
n
s
t
e
a
d
a
t
p
a
r
q
u
e
t
.
i
o
.
M
e
s
s
a
g
e
C
o
l
u
m
n
I
O
Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.RuntimeException: Parquet record is malformed: empty fields are illegal, the field should be ommited completely instead at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.write(DataWritableWriter.java:64) at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriteSupport.write(DataWritableWriteSupport.java:59) at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriteSupport.write(DataWritableWriteSupport.java:31) at parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:121) at parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:123) at parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:42) at org.apache.hadoop.hive.ql.io.parquet.write.ParquetRecordWriterWrapper.write(ParquetRecordWriterWrapper.java:111) at org.apache.hadoop.hive.ql.io.parquet.write.ParquetRecordWriterWrapper.write(ParquetRecordWriterWrapper.java:124) at org.apache.spark.sql.hive.SparkHiveDynamicPartitionWriterContainer.writeToFile(hiveWriterContainers.scala:321) ... 8 more Caused by: parquet.io.ParquetEncodingException: empty fields are illegal, the field should be ommited completely instead at parquet.io.MessageColumnIO
Worker.run(ThreadPoolExecutor.java:617)atjava.lang.Thread.run(Thread.java:745)Causedby:java.lang.RuntimeException:Parquetrecordismalformed:emptyfieldsareillegal,thefieldshouldbeommitedcompletelyinsteadatorg.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.write(DataWritableWriter.java:64)atorg.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriteSupport.write(DataWritableWriteSupport.java:59)atorg.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriteSupport.write(DataWritableWriteSupport.java:31)atparquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:121)atparquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:123)atparquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:42)atorg.apache.hadoop.hive.ql.io.parquet.write.ParquetRecordWriterWrapper.write(ParquetRecordWriterWrapper.java:111)atorg.apache.hadoop.hive.ql.io.parquet.write.ParquetRecordWriterWrapper.write(ParquetRecordWriterWrapper.java:124)atorg.apache.spark.sql.hive.SparkHiveDynamicPartitionWriterContainer.writeToFile(hiveWriterContainers.scala:321)...8moreCausedby:parquet.io.ParquetEncodingException:emptyfieldsareillegal,thefieldshouldbeommitedcompletelyinsteadatparquet.io.MessageColumnIOMessageColumnIORecordConsumer.endField(MessageColumnIO.java:244)
at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.writeMap(DataWritableWriter.java:241)
at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.writevalue(DataWritableWriter.java:116)1.
at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.writeGroupFields(DataWritableWriter.java:89)
at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.write(DataWritableWriter.java:60)
解决方法:首先定位你的map,?>或array>类型的报错的位置,(定位的方法:一条尝map或者array的值手动设为null,尝试前面一半的时候,就把后面的一半数据设为null按照此方法依次尝试,)
第二:将对应的位置找到后,自己顾虑空值
列插入空集合或者map中存在key为null的情形时,就会触发这个错误,
后来发现官方已经有讨论:https://issues.apache.org/jira/browse/HIVE-11625
目前还没有修复这个问题的版本!!!
欢迎分享,转载请注明来源:内存溢出
评论列表(0条)