实践数据湖iceberg 第十一课 测试分区表完整流程(造数、建表、合并、删快照)

实践数据湖iceberg 第十一课 测试分区表完整流程(造数、建表、合并、删快照),第1张

实践数据湖iceberg 第十一课 测试分区表完整流程(造数、建表、合并、删快照) 系列文章目录

实践数据湖iceberg 第一课 入门
实践数据湖iceberg 第二课 iceberg基于hadoop的底层数据格式
实践数据湖iceberg 第三课 在sqlclient中,以sql方式从kafka读数据到iceberg
实践数据湖iceberg 第四课 在sqlclient中,以sql方式从kafka读数据到iceberg(升级版本到flink1.12.7)
实践数据湖iceberg 第五课 hive catalog特点
实践数据湖iceberg 第六课 从kafka写入到iceberg失败问题 解决
实践数据湖iceberg 第七课 实时写入到iceberg
实践数据湖iceberg 第八课 hive与iceberg集成
实践数据湖iceberg 第九课 合并小文件
实践数据湖iceberg 第十课 快照删除
实践数据湖iceberg 第十一课 测试分区表完整流程(造数、建表、合并、删快照)
实践数据湖iceberg 第十二课 catalog是什么


文章目录

系列文章目录前言1. 制造数据

1.1 生成数据1.2 log->flume->kafka 2. 建立、测试分区表

2.1 启动flink-sql2.2 建hiveCatalog2.3建分区表 3.把kafka的行为数据写入分区表

3.1 定义kafka表 为csv类型(失败) 3.2 定义kafka表 为raw类型(成功)

3.3 把kafka表转为分区表的sql3.4 在flink-sql实际执行sql汇总 4.重复步骤1和步骤3,生成多天的数据5.合并小文件,观察6.合并并删除老快照,观察总结


前言

测试分区表的小文件合并,快照删除,查看对分区表的影响

模拟生产环境
测试架构: log->kafka-> flink->iceberg


1. 制造数据 1.1 生成数据

要求: 生成的id范围、生成数据频率、数据的日期可配

package org.example;


import java.util.Calendar;
import java.util.Random;


public class GenerateLogWithDate
{
    public static void main( String[] args ) throws InterruptedException {
        if(args.length !=5){
            System.out.println("<生成id范围 1-?><每条数据停顿时长毫秒><日期 yyyy>
"); System.exit(0); } int len = 100000; int sleepMilis = 1000; Calendar cal = null; // System.out.println("<生成id范围><每条数据停顿时长毫秒>"); if(args.length == 1){ len = Integer.valueOf(args[0]); } if(args.length == 2){ len = Integer.valueOf(args[0]); sleepMilis = Integer.valueOf(args[1]); } if(args.length == 5){ cal = Calendar.getInstance(); int year = Integer.valueOf(args[2]); int month = Integer.valueOf(args[3]); int day = Integer.valueOf(args[4]); cal.set(year,month,day); } Random random = new Random(); for(int i=0; i<10000000; i++){ System.out.println(i+"," + random.nextInt(len)+","+ Calendar.getInstance().getTimeInMillis() ); Thread.sleep(sleepMilis); } } }

打包运行,效果如下:

[root@hadoop101 software]# java -jar log-generater-1.0-SNAPSHOT.jar 100000 1000 2022 0 20
0,54598,1643341060767
1,71915,1643341061768
2,69469,1643341062768
3,7125,1643341063768
  java -jar /opt/software/log-generater-1.0-SNAPSHOT.jar 100000 1000 2022 0 20 > /opt/module/logs/withDateFileToKakfa.log
[root@hadoop102 ~]# java -jar /opt/software/log-generater-1.0-SNAPSHOT.jar 100000 10 2022 0 27 >  /opt/module/logs/withDateFileToKakfa.log
更换日期:
[root@hadoop102 ~]# java -jar /opt/software/log-generater-1.0-SNAPSHOT.jar 1000000 1 2022 0 27 >>  /opt/module/logs/withDateFileToKakfa.log
[root@hadoop102 ~]# java -jar /opt/software/log-generater-1.0-SNAPSHOT.jar 1000000 1 2022 0 26 >>  /opt/module/logs/withDateFileToKakfa.log
[root@hadoop102 ~]# java -jar /opt/software/log-generater-1.0-SNAPSHOT.jar 1000000 1 2022 0 29 >>  /opt/module/logs/withDateFileToKakfa.log
1.2 log->flume->kafka

先在kafka创建topic behavior_with_date_log。

flume agent配置:
kafka作为channel,没有sink
cd $FLUME_HOME/conf
vim with-date-file-to-kafka.conf

a1.sources = r1
a1.channels = c1
a1.sources.r1.type = TAILDIR

a1.sources.r1.positionFile = /var/log/flume/withDate_taildir_position.json
a1.sources.r1.filegroups = f1
a1.sources.r1.filegroups.f1 = /opt/module/logs/withDateFileToKakfa.log

a1.channels.c1.type = org.apache.flume.channel.kafka.KafkaChannel
a1.channels.c1.kafka.bootstrap.servers = hadoop101:9092,hadoop102:9092,hadoop103:9092
a1.channels.c1.kafka.topic = behavior_with_date_log
a1.channels.c1.kafka.consumer.group.id = flume-consumer2

#绑定source和channel以及sink和channel的关系
a1.sources.r1.channels = c1

准备启停agent脚本:

[root@hadoop102 bin]# cat flumeWithDate.sh 
#! /bin/bash

case $1 in
"start"){
        for i in hadoop102
        do
                echo " --------启动 $i flume with-date-file-to-kafka.conf -------"
                ssh $i "source /etc/profile;cd /opt/module/flume; nohup  flume-ng agent --conf conf --conf-file conf/with-date-file-to-kafka.conf  --name a1 -Dflume.root.logger=INFO,console  >/opt/module/flume/logs/with-date-file-to-kafka.conf   2>&1 &"
        done
};;
"stop"){
        for i in hadoop102
        do
                echo " --------停止 $i flume with-date-file-to-kafka.conf -------"
                ssh $i "ps -ef | grep with-date-file-to-kafka.conf  | grep -v grep |awk '{print $2}' | xargs -n1 kill"
        done

};;
esac

启动agent

[root@hadoop102 bin]# flumeWithDate.sh start
 --------启动 hadoop102 flume with-date-file-to-kafka.conf -------
/etc/profile: line 49: HISTCONTROL: readonly variable

查看数据是否进入kafka
发现:确实进去了

消费出来看看

[root@hadoop103 ~]#  kafka-console-consumer.sh --topic behavior_with_date_log --bootstrap-server hadoop101:9092,hadoop102:9092,hadoop103:9092 --from-beginning
4129299,24977,1643354586164
4129302,18826,1643354586465
4129255,98763,1643354581760
4129258,68045,1643354582060
4129261,42309,1643354582361
4129264,25737,1643354582661
4129267,63120,1643354582961
4129270,62009,1643354583261
4129273,36358,1643354583562
4129276,18414,1643354583862
4129279,73156,1643354584162
4129282,80180,1643354584463
2. 建立、测试分区表 2.1 启动flink-sql

[root@hadoop101 ~]# sql-client.sh embedded -j /opt/software/iceberg-flink-runtime-0.12.1.jar -j /opt/software/flink-sql-connector-hive-2.3.6_2.12-1.12.7.jar -j /opt/software/flink-sql-connector-kafka_2.12-1.12.7.jar shell

2.2 建hiveCatalog

继续使用hiveDatalog6
每次使用catalog都需要创建一下

创建catalog脚本:

CREATE CATALOG hive_catalog6 WITH (
  'type'='iceberg',
  'catalog-type'='hive',
  'uri'='thrift://hadoop101:9083',
  'clients'='5',
  'property-version'='1',
  'warehouse'='hdfs:///user/hive/warehouse/hive_catalog6'
);

database 各个catalog是共享的
use catalog hive_catalog6;
create database iceberg_db6;

2.3建分区表
CREATE TABLE hive_catalog6.iceberg_db6.behavior_with_date_log_ib (
  i STRING, id STRING,otime STRING,dt STRING
) PARTITIonED BY (
  dt 
) 

查看生成的目录

[root@hadoop103 ~]# hadoop fs -ls -R /user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_with_date_log_ib/
drwxr-xr-x   - root supergroup          0 2022-01-28 16:19 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_with_date_log_ib/metadata
-rw-r--r--   2 root supergroup       1793 2022-01-28 16:19 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_with_date_log_ib/metadata/00000-cbb96f75-6438-4cfd-8c09-9da953af8519.metadata.json

drop table事项:
metadata下的文件会删除,这个表到medata的目录会保留!
会保留到目录如下:

/user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_with_date_log_ib/metadata
3.把kafka的行为数据写入分区表 3.1 定义kafka表 为csv类型(失败)

定义表为csv类型


create table behavior_with_date_log
(
i STRING,ID STRING,OTIME STRING
) WITH (
  'connector' = 'kafka',
  'topic' = 'behavior_with_date_log',
  'properties.bootstrap.servers' = 'hadoop101:9092,hadoop102:9092,hadoop103:9092',
  'properties.group.id' = 'rickGroup6',
  'scan.startup.mode' = 'earliest-offset',
  'csv.field-delimiter'=',',
  'csv.ignore-parse-errors' = 'true',
  'format' = 'csv'
)

每个字段都定义为STRING,  竟然前面是乱码,后面2个字段都是null,什么原因?

每个字段都定义为STRING, 竟然前面是乱码,后面2个字段都是null,什么原因?
没加 ‘csv.ignore-parse-errors’ = ‘true’, 显示的还是NULL
分隔符有问题?我把日志的分隔符直接复制到建表语句,重建,还是一样


使用kafka-console-consumer.sh 直接消费出来是没有问题的

kafka-console-consumer.sh --topic behavior_with_date_log --bootstrap-server hadoop101:9092,hadoop102:9092,hadoop103:9092 --from-beginning
4132950,38902,1643354951608
4132903,27044,1643354946904
4132906,17342,1643354947204
4132909,42787,1643354947504
4132912,70169,1643354947805
4132915,68492,1643354948105
4132918,38690,1643354948405
4132921,53191,1643354948705
4132924,82465,1643354949006

删表重建语句:
drop table behavior_with_date_log

其实是可以把表定义为raw类型,但flink-sql没有内置split函数。有内置split_index函数,为什么show functions查不到split_index函数?

3.2 定义kafka表 为raw类型(成功)
create table behavior_with_date_log
(
log STRING
) WITH (
  'connector' = 'kafka',
  'topic' = 'behavior_with_date_log',
  'properties.bootstrap.servers' = 'hadoop101:9092,hadoop102:9092,hadoop103:9092',
  'properties.group.id' = 'rickGroup6',
  'scan.startup.mode' = 'earliest-offset',
  'format' = 'raw'
)

Flink SQL>

create table behavior_with_date_log
(
log STRING
) WITH (
‘connector’ = ‘kafka’,
‘topic’ = ‘behavior_with_date_log’,
‘properties.bootstrap.servers’ = ‘hadoop101:9092,hadoop102:9092,hadoop103:9092’,
‘properties.group.id’ = ‘rickGroup6’,
‘scan.startup.mode’ = ‘earliest-offset’,
‘format’ = ‘raw’
);
[INFO] Table has been created.

sql语法:

 select split_index('4135009,73879,1643355157701',',',0),   split_index('4135009,73879,1643355157701',',',1),split_index('4135009,73879,1643355157701',',',2);
 SELECT TO_TIMESTAMP(FROM_UNIXTIME(1513135677000 / 1000, 'yyyy-MM-dd'));

对kafka原始日志做etl,增加dt字段 , sql如下:


 select split_index(log,',',0)  as i,   split_index(log,',',1)  as id,split_index(log,',',2)  as otime , from_unixtime(cast(cast(split_index(log,',',2)   as bigint)/1000 as bigint),'yyyyMMdd') as dt  from behavior_with_date_log ;


3.3 把kafka表转为分区表的sql
insert into hive_catalog6.iceberg_db6.behavior_with_date_log_ib 
 select split_index(log,',',0) as i,   split_index(log,',',1) as id,split_index(log,',',2)  as otime , from_unixtime(cast(cast(split_index(log,',',2)   as bigint)/1000 as bigint),'yyyyMMdd') as dt  from behavior_with_date_log ;
3.4 在flink-sql实际执行sql汇总
[root@hadoop101 ~]# sql-client.sh embedded -j /opt/software/iceberg-flink-runtime-0.12.1.jar  -j /opt/software/flink-sql-connector-hive-2.3.6_2.12-1.12.7.jar  -j /opt/software/flink-sql-connector-kafka_2.12-1.12.7.jar  shell 
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/module/flink-1.12.7/lib/log4j-slf4j-impl-2.16.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/module/hadoop-2.7.2/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
No default environment specified.
Searching for '/opt/module/flink-1.12.7/conf/sql-client-defaults.yaml'...found.
Reading default environment from: file:/opt/module/flink-1.12.7/conf/sql-client-defaults.yaml
No session environment specified.

Command history file path: /root/.flink-sql-history
                                   ▒▓██▓██▒
                               ▓████▒▒█▓▒▓███▓▒
                            ▓███▓░░        ▒▒▒▓██▒  ▒
                          ░██▒   ▒▒▓▓█▓▓▒░      ▒████
                          ██▒         ░▒▓███▒    ▒█▒█▒
                            ░▓█            ███   ▓░▒██
                              ▓█       ▒▒▒▒▒▓██▓░▒░▓▓█
                            █░ █   ▒▒░       ███▓▓█ ▒█▒▒▒
                            ████░   ▒▓█▓      ██▒▒▒ ▓███▒
                         ░▒█▓▓██       ▓█▒    ▓█▒▓██▓ ░█░
                   ▓░▒▓████▒ ██         ▒█    █▓░▒█▒░▒█▒
                  ███▓░██▓  ▓█           █   █▓ ▒▓█▓▓█▒
                ░██▓  ░█░            █  █▒ ▒█████▓▒ ██▓░▒
               ███░ ░ █░          ▓ ░█ █████▒░░    ░█░▓  ▓░
              ██▓█ ▒▒▓▒          ▓███████▓░       ▒█▒ ▒▓ ▓██▓
           ▒██▓ ▓█ █▓█       ░▒█████▓▓▒░         ██▒▒  █ ▒  ▓█▒
           ▓█▓  ▓█ ██▓ ░▓▓▓▓▓▓▓▒              ▒██▓           ░█▒
           ▓█    █ ▓███▓▒░              ░▓▓▓███▓          ░▒░ ▓█
           ██▓    ██▒    ░▒▓▓███▓▓▓▓▓██████▓▒            ▓███  █
          ▓███▒ ███   ░▓▓▒░░   ░▓████▓░                  ░▒▓▒  █▓
          █▓▒▒▓▓██  ░▒▒░░░▒▒▒▒▓██▓░                            █▓
          ██ ▓░▒█   ▓▓▓▓▒░░  ▒█▓       ▒▓▓██▓    ▓▒          ▒▒▓
          ▓█▓ ▓▒█  █▓░  ░▒▓▓██▒            ░▓█▒   ▒▒▒░▒▒▓█████▒
           ██░ ▓█▒█▒  ▒▓▓▒  ▓█                █░      ░░░░   ░█▒
           ▓█   ▒█▓   ░     █░                ▒█              █▓
            █▓   ██         █░                 ▓▓        ▒█▓▓▓▒█░
             █▓ ░▓██░       ▓▒                  ▓█▓▒░░░▒▓█░    ▒█
              ██   ▓█▓░      ▒                    ░▒█▒██▒      ▓▓
               ▓█▒   ▒█▓▒░                         ▒▒ █▒█▓▒▒░░▒██
                ░██▒    ▒▓▓▒                     ▓██▓▒█▒ ░▓▓▓▓▒█▓
                  ░▓██▒                          ▓░  ▒█▓█  ░░▒▒▒
                      ▒▓▓▓▓▓▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒░░▓▓  ▓░▒█░
          
    ______ _ _       _       _____  ____  _         _____ _ _            _  BETA   
   |  ____| (_)     | |     / ____|/ __ | |       / ____| (_)          | |  
   | |__  | |_ _ __ | | __ | (___ | |  | | |      | |    | |_  ___ _ __ | |_ 
   |  __| | | | '_ | |/ /  ___ | |  | | |      | |    | | |/ _  '_ | __|
   | |    | | | | | |   <   ____) | |__| | |____  | |____| | |  __/ | | | |_ 
   |_|    |_|_|_| |_|_|_ |_____/ __________|  _____|_|_|___|_| |_|__|
          
        Welcome! Enter 'HELP;' to list all available commands. 'QUIT;' to exit.


Flink SQL> CREATE CATALOG hive_catalog6 WITH (
>   'type'='iceberg',
>   'catalog-type'='hive',
>   'uri'='thrift://hadoop101:9083',
>   'clients'='5',
>   'property-version'='1',
>   'warehouse'='hdfs:///user/hive/warehouse/hive_catalog6'
> );
2022-01-28 16:20:20,351 INFO  org.apache.hadoop.hive.conf.HiveConf                         [] - Found configuration file file:/opt/module/hive/conf/hive-site.xml
2022-01-28 16:20:20,574 WARN  org.apache.hadoop.hive.conf.HiveConf                         [] - HiveConf of name hive.metastore.event.db.notification.api.auth does not exist
[INFO] Catalog has been created.

Flink SQL> create table behavior_with_date_log
> (
> log STRING
> ) WITH (
>   'connector' = 'kafka',
>   'topic' = 'behavior_with_date_log',
>   'properties.bootstrap.servers' = 'hadoop101:9092,hadoop102:9092,hadoop103:9092',
>   'properties.group.id' = 'rickGroup6',
>   'scan.startup.mode' = 'earliest-offset',
>   'format' = 'raw'
> )
> ;
[INFO] Table has been created.

Flink SQL> insert into hive_catalog6.iceberg_db6.behavior_with_date_log_ib 
>  select split_index(log,',',0) as i,   split_index(log,',',1) as id,split_index(log,',',2)  as otime , from_unixtime(cast(cast(split_index(log,',',2)   as bigint)/1000 as bigint),'yyyyMMdd') as dt  from behavior_with_date_log ;
[INFO] Submitting SQL update statement to the cluster...
[INFO] Table update statement has been successfully submitted to the cluster:
Job ID: 78930f941991e19112d3917fd4dd4cb2


4.重复步骤1和步骤3,生成多天的数据

输出的文件:


1分钟后:

为什么每分钟2个文件?

5.合并小文件,观察

先统计小文件个数

[root@hadoop103 ~]# hadoop fs -ls /user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_with_date_log_ib/data
Found 6 items
drwxr-xr-x   - root supergroup          0 2022-01-28 17:36 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_with_date_log_ib/data/dt=19700101
drwxr-xr-x   - root supergroup          0 2022-01-28 16:36 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_with_date_log_ib/data/dt=20220104
drwxr-xr-x   - root supergroup          0 2022-01-28 17:30 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_with_date_log_ib/data/dt=20220126
drwxr-xr-x   - root supergroup          0 2022-01-28 17:29 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_with_date_log_ib/data/dt=20220127
drwxr-xr-x   - root supergroup          0 2022-01-28 16:29 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_with_date_log_ib/data/dt=20220128
drwxr-xr-x   - root supergroup          0 2022-01-28 17:37 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_with_date_log_ib/data/dt=20220129

28号的数据文件:

[root@hadoop103 ~]# hadoop fs -ls /user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_with_date_log_ib/data/dt=20220128
Found 18 items
-rw-r--r--   2 root supergroup     441875 2022-01-28 16:22 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_with_date_log_ib/data/dt=20220128/00000-0-0c8e94fd-5c8e-48df-ac03-4fe0e109940f-00001.parquet
-rw-r--r--   2 root supergroup       2842 2022-01-28 16:23 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_with_date_log_ib/data/dt=20220128/00000-0-0c8e94fd-5c8e-48df-ac03-4fe0e109940f-00003.parquet
-rw-r--r--   2 root supergroup       2819 2022-01-28 16:24 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_with_date_log_ib/data/dt=20220128/00000-0-0c8e94fd-5c8e-48df-ac03-4fe0e109940f-00004.parquet
-rw-r--r--   2 root supergroup       2833 2022-01-28 16:25 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_with_date_log_ib/data/dt=20220128/00000-0-0c8e94fd-5c8e-48df-ac03-4fe0e109940f-00005.parquet
-rw-r--r--   2 root supergroup       2801 2022-01-28 16:26 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_with_date_log_ib/data/dt=20220128/00000-0-0c8e94fd-5c8e-48df-ac03-4fe0e109940f-00006.parquet
-rw-r--r--   2 root supergroup       2824 2022-01-28 16:27 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_with_date_log_ib/data/dt=20220128/00000-0-0c8e94fd-5c8e-48df-ac03-4fe0e109940f-00007.parquet
-rw-r--r--   2 root supergroup       2802 2022-01-28 16:28 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_with_date_log_ib/data/dt=20220128/00000-0-0c8e94fd-5c8e-48df-ac03-4fe0e109940f-00008.parquet
-rw-r--r--   2 root supergroup       2781 2022-01-28 16:29 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_with_date_log_ib/data/dt=20220128/00000-0-0c8e94fd-5c8e-48df-ac03-4fe0e109940f-00009.parquet
-rw-r--r--   2 root supergroup       2000 2022-01-28 16:30 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_with_date_log_ib/data/dt=20220128/00000-0-0c8e94fd-5c8e-48df-ac03-4fe0e109940f-00010.parquet
-rw-r--r--   2 root supergroup     882110 2022-01-28 16:22 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_with_date_log_ib/data/dt=20220128/00001-0-29418337-5d27-40bb-98cb-94b3b5007b99-00001.parquet
-rw-r--r--   2 root supergroup       4385 2022-01-28 16:23 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_with_date_log_ib/data/dt=20220128/00001-0-29418337-5d27-40bb-98cb-94b3b5007b99-00003.parquet
-rw-r--r--   2 root supergroup       4364 2022-01-28 16:24 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_with_date_log_ib/data/dt=20220128/00001-0-29418337-5d27-40bb-98cb-94b3b5007b99-00004.parquet
-rw-r--r--   2 root supergroup       4383 2022-01-28 16:25 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_with_date_log_ib/data/dt=20220128/00001-0-29418337-5d27-40bb-98cb-94b3b5007b99-00005.parquet
-rw-r--r--   2 root supergroup       4340 2022-01-28 16:26 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_with_date_log_ib/data/dt=20220128/00001-0-29418337-5d27-40bb-98cb-94b3b5007b99-00006.parquet
-rw-r--r--   2 root supergroup       4357 2022-01-28 16:27 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_with_date_log_ib/data/dt=20220128/00001-0-29418337-5d27-40bb-98cb-94b3b5007b99-00007.parquet
-rw-r--r--   2 root supergroup       4347 2022-01-28 16:28 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_with_date_log_ib/data/dt=20220128/00001-0-29418337-5d27-40bb-98cb-94b3b5007b99-00008.parquet
-rw-r--r--   2 root supergroup       4340 2022-01-28 16:29 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_with_date_log_ib/data/dt=20220128/00001-0-29418337-5d27-40bb-98cb-94b3b5007b99-00009.parquet
-rw-r--r--   2 root supergroup       2688 2022-01-28 16:30 /user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_with_date_log_ib/data/dt=20220128/00001-0-29418337-5d27-40bb-98cb-94b3b5007b99-00010.parquet
[root@hadoop103 ~]# hadoop fs -ls /user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_with_date_log_ib/data/dt=20220128 |wc
     19     147    3813

文件个数中,上面,最后一行19-1=18,就是18个文件。

[root@hadoop103 ~]#  hadoop fs -ls /user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_with_date_log_ib/data/dt=19700101 |wc
     43     339    8877
[root@hadoop103 ~]#  hadoop fs -ls /user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_with_date_log_ib/data/dt=20220104 |wc
     17     131    3391
[root@hadoop103 ~]#  hadoop fs -ls /user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_with_date_log_ib/data/dt=20220126 |wc
      5      35     858
[root@hadoop103 ~]#  hadoop fs -ls /user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_with_date_log_ib/data/dt=20220127 |wc
    109     867   22804
[root@hadoop103 ~]#  hadoop fs -ls /user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_with_date_log_ib/data/dt=20220128 |wc
     19     147    3813
[root@hadoop103 ~]#  hadoop fs -ls /user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_with_date_log_ib/data/dt=20220129 |wc
     21     163    4235

其中29日的数据还在不断写入:

  [root@hadoop103 ~]#   hadoop fs -ls /user/hive/warehouse/hive_catalog6/iceberg_db6.db/behavior_with_date_log_ib/data/dt=20220129 |wc
     23     179    4657
6.合并并删除老快照,观察

没跑完。。。,年后继续。。。

Caused by: javax.jdo.JDOFatalUserException: Class org.datanucleus.api.jdo.JDOPersistenceManagerFactory was not found.
NestedThrowables:
java.lang.ClassNotFoundException: org.datanucleus.api.jdo.JDOPersistenceManagerFactory
	at javax.jdo.JDOHelper.invokeGetPersistenceManagerFactoryonImplementation(JDOHelper.java:1175)
	at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:808)
	at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:701)
	at org.apache.hadoop.hive.metastore.ObjectStore.getPMF(ObjectStore.java:521)
	at org.apache.hadoop.hive.metastore.ObjectStore.getPersistenceManager(ObjectStore.java:550)
	at org.apache.hadoop.hive.metastore.ObjectStore.initializeHelper(ObjectStore.java:405)
	at org.apache.hadoop.hive.metastore.ObjectStore.initialize(ObjectStore.java:342)
	at org.apache.hadoop.hive.metastore.ObjectStore.setConf(ObjectStore.java:303)
	at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:76)
	at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:136)
	at org.apache.hadoop.hive.metastore.RawStoreProxy.(RawStoreProxy.java:58)
	at org.apache.hadoop.hive.metastore.RawStoreProxy.getProxy(RawStoreProxy.java:67)
	at org.apache.hadoop.hive.metastore.HivemetaStore$HMSHandler.newRawStoreForConf(HivemetaStore.java:628)
	at org.apache.hadoop.hive.metastore.HivemetaStore$HMSHandler.getMSForConf(HivemetaStore.java:594)
	at org.apache.hadoop.hive.metastore.HivemetaStore$HMSHandler.getMS(HivemetaStore.java:588)
	at org.apache.hadoop.hive.metastore.HivemetaStore$HMSHandler.createDefaultDB(HivemetaStore.java:659)
	at org.apache.hadoop.hive.metastore.HivemetaStore$HMSHandler.init(HivemetaStore.java:431)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:148)
	... 24 more
Caused by: java.lang.ClassNotFoundException: org.datanucleus.api.jdo.JDOPersistenceManagerFactory
	at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
	at java.lang.Class.forName0(Native Method)
	at java.lang.Class.forName(Class.java:348)
	at javax.jdo.JDOHelper$18.run(JDOHelper.java:2018)
	at javax.jdo.JDOHelper$18.run(JDOHelper.java:2016)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.jdo.JDOHelper.forName(JDOHelper.java:2015)
	at javax.jdo.JDOHelper.invokeGetPersistenceManagerFactoryonImplementation(JDOHelper.java:1162)
	... 45 more

Process finished with exit code 1


总结

欢迎分享,转载请注明来源:内存溢出

原文地址: https://outofmemory.cn/zaji/5720169.html

(0)
打赏 微信扫一扫 微信扫一扫 支付宝扫一扫 支付宝扫一扫
上一篇 2022-12-17
下一篇 2022-12-17

发表评论

登录后才能评论

评论列表(0条)

保存