TPC-H测试版权说明: 本文由博主keep丶原创,转载请注明出处。
原文地址: https://blog.csdn.net/qq_38688267/article/details/122535189
简介工具准备
下载编译 生成测试数据
初始化表结构
PGHIVEMySQL 使用DBGEN生成数据加载数据文件到DB
PGHIVEMySQL 标准测试SQL参考
简介TPC-H(商业智能计算测试) 是美国交易处理效能委员会(TPC,Transaction Processing Performance Council) 组织制定的用来模拟决策支持类应用的一个测试集.目前,在学术界和工业界普遍采用它来评价决策支持技术方面应用的性能. 这种商业测试可以全方位评测系统的整体商业计算综合能力,对厂商的要求更高,同时也具有普遍的商业实用意义,目前在银行信贷分析和xyk分析、电信运营分析、税收分析、烟草行业决策分析中都有广泛的应用。
官网:http://www.tpc.org/tpch/
工具准备 下载
官方下载地址:http://tpc.org/tpc_documents_current_versions/download_programs/tools-download-request5.asp?bm_type=TPC-H&bm_vers=3.0.0&mode=CURRENT-ONLYTPC-HV3.0.0资源地址:https://download.csdn.net/download/qq_38688267/75664932 编译
作者的数据库在Linux服务器上,因此是在Linux上编译,如果是windows系统,则使用gcc命令编译,请参考https://blog.csdn.net/qilimi1053620912/article/details/88573017
具体步骤如下:
# 解压 unzip master.zip # 解压后得到tpch-dbgen-master文件夹 cd tpch-dbgen-master # 修改makefile.suite vim makefile.suite # CC = gcc # DATAbase= ORACLE # MACHINE = LINUX # WORKLOAD = TPCH ## DATAbase和MACHINE根据实际情况和文件中的提示填写 ## DATAbase目前支持INFORMIX, DB2, TDAT (Teradata), SQLSERVER, SYbase, ORACLE。如果是HIVE、PG、MySQL之类的都可以填ORACLE. # 编译 make -f makefile.suite # 编译完成后会生成dbgen文件,如果没有或者报错则表示编译错误,需要再检查一下makefile.suite文件内容。生成测试数据 初始化表结构
在tpch-dbgen-master/dss.ddl中定义了表初始化语句,部分数据库可能语法不一样,稍微改一下其语法即可,下文将以postgresql, hive, mysql为例,说明其建表语句的变化。
PG无需变化,直接使用即可。
HIVEhive数据库需要指定一下列分隔字符为’|’,因为dbgen生成的数据是以’|'符号分隔的。
CREATE TABLE NATION ( N_NATIonKEY INTEGER, N_NAME CHAR ( 25 ), N_REGIonKEY INTEGER, N_COMMENT VARCHAR ( 152 ) ) ROW format delimited fields terminated BY '|'; CREATE TABLE REGION ( R_REGIonKEY INTEGER, R_NAME CHAR ( 25 ), R_COMMENT VARCHAR ( 152 ) ) ROW format delimited fields terminated BY '|'; CREATE TABLE PART (P_PARTKEY INTEGER,P_NAME VARCHAR ( 55 ),P_MFGR CHAR ( 25 ),P_BRAND CHAR ( 10 ),P_TYPE VARCHAR ( 25 ),P_SIZE INTEGER,P_ConTAINER CHAR ( 10 ),P_RETAILPRICE DECIMAL ( 15, 2 ),P_COMMENT VARCHAR ( 23 ) ) ROW format delimited fields terminated BY '|'; CREATE TABLE SUPPLIER (S_SUPPKEY INTEGER,S_NAME CHAR ( 25 ),S_ADDRESS VARCHAR ( 40 ),S_NATIonKEY INTEGER,S_PHONE CHAR ( 15 ),S_ACCTBAL DECIMAL ( 15, 2 ),S_COMMENT VARCHAR ( 101 ) ) ROW format delimited fields terminated BY '|'; CREATE TABLE PARTSUPP ( PS_PARTKEY INTEGER, PS_SUPPKEY INTEGER, PS_AVAILQTY INTEGER, PS_SUPPLYCOST DECIMAL ( 15, 2 ), PS_COMMENT VARCHAR ( 199 ) ) ROW format delimited fields terminated BY '|'; CREATE TABLE CUSTOMER (C_CUSTKEY INTEGER,C_NAME VARCHAR ( 25 ),C_ADDRESS VARCHAR ( 40 ),C_NATIonKEY INTEGER,C_PHONE CHAR ( 15 ),C_ACCTBAL DECIMAL ( 15, 2 ),C_MKTSEGMENT CHAR ( 10 ),C_COMMENT VARCHAR ( 117 ) ) ROW format delimited fields terminated BY '|'; CREATE TABLE ORDERS (O_ORDERKEY INTEGER,O_CUSTKEY INTEGER,O_ORDERSTATUS CHAR ( 1 ),O_TOTALPRICE DECIMAL ( 15, 2 ),O_ORDERDATE DATE,O_ORDERPRIORITY CHAR ( 15 ),O_CLERK CHAR ( 15 ),O_SHIPPRIORITY INTEGER,O_COMMENT VARCHAR ( 79 ) ) ROW format delimited fields terminated BY '|'; CREATE TABLE LINEITEM (L_ORDERKEY INTEGER,L_PARTKEY INTEGER,L_SUPPKEY INTEGER,L_LINENUMBER INTEGER,L_QUANTITY DECIMAL ( 15, 2 ),L_EXTENDEDPRICE DECIMAL ( 15, 2 ),L_DISCOUNT DECIMAL ( 15, 2 ),L_TAX DECIMAL ( 15, 2 ),L_RETURNFLAG CHAR ( 1 ),L_LINESTATUS CHAR ( 1 ),L_SHIPDATE DATE,L_COMMITDATE DATE,L_RECEIPTDATE DATE,L_SHIPINSTRUCT CHAR ( 25 ),L_SHIPMODE CHAR ( 10 ),L_COMMENT VARCHAR ( 44 ) ) ROW format delimited fields terminated BY '|';MySQL
mysql需要将Integer等数据类型改成bigint或者int,根据需要测试的数据量来定。
通过tpch-dbgen-master下编译出来的dbgen生成数据,具体语法如下:
./dbgen -s 1000 -C 1 -f &
数据量的大小对查询速度有直接的影响,TPC-H中使用SF描述数据量,1SF对应1 GB单位。1000SF,即1 TB。1SF对应的数据量只是8个表的总数据量不包括索引等空间占用,准备数据时需预留更多空间。
-s表示sf值,如-s 1表示生成1G数据:
-s 0.16最大表lineitem约96万条数据
-s 1.6最大表lineitem约960万条数据
-s 16最大表lineitem约0.96亿条数据
-s 20最大表lineitem约1.2亿条数据
-S表示当前命令生成第几个 chunk。
-C表示一共分成几个chunk。一条语句只能生成一个 chunk。
-f表示强制生成,会替换之前生成的.tbl文件更多dbgen命令说明请使用./dbgen --help查看:
-C
– separate data set into chunks (requires -S, default: 1)
-f – force. Overwrite existing files
-h – display this message
-q – enable QUIET mode
-s– set Scale Factor (SF) to (default: 1)
-S– build the th step of the data/update set (used with -C or -U)
-U– generate update sets
-v – enable VERBOSE mode
-b– load distributions for(default: dists.dss)
-d– split deletes between files (requires -U)
-i– split inserts between files (requires -U)
-T c – generate cutomers ONLY
-T l – generate nation/region ONLY
-T L – generate lineitem ONLY
-T n – generate nation ONLY
-T o – generate orders/lineitem ONLY
-T O – generate orders ONLY
-T p – generate parts/partsupp ONLY
-T P – generate parts ONLY
-T r – generate region ONLY
-T s – generate suppliers ONLY
-T S – generate partsupp ONLYTo generate the SF=1 (1GB), validation database population, use:
dbgen -vf -s 1
To generate updates for a SF=1 (1GB), use:
dbgen -v -U 1 -s 1
加载数据文件到DB
作者的工作路径是/opt/perfor_test/tpch-dbgen-master,下文中的命令请根据实际路径修改。如果是windows系统,则使用D:\customer.tbl之类的路径名。
PGcopy customer from '/opt/perfor_test/tpch-dbgen-master/customer.tbl' DELIMITER '|'; copy lineitem from '/opt/perfor_test/tpch-dbgen-master/lineitem.tbl' DELIMITER '|'; copy orders from '/opt/perfor_test/tpch-dbgen-master/orders.tbl' DELIMITER '|'; copy partsupp from '/opt/perfor_test/tpch-dbgen-master/partsupp.tbl' DELIMITER '|'; copy part from '/opt/perfor_test/tpch-dbgen-master/part.tbl' DELIMITER '|'; copy supplier from '/opt/perfor_test/tpch-dbgen-master/supplier.tbl' DELIMITER '|'; copy nation from '/opt/perfor_test/tpch-dbgen-master/nation.tbl' DELIMITER '|'; copy region from '/opt/perfor_test/tpch-dbgen-master/region.tbl' DELIMITER '|';HIVE
load data local inpath '/opt/perfor_test/tpch-dbgen-master/customer.tbl' overwrite into table customer; load data local inpath '/opt/perfor_test/tpch-dbgen-master/lineitem.tbl' overwrite into table lineitem; load data local inpath '/opt/perfor_test/tpch-dbgen-master/nation.tbl' overwrite into table nation; load data local inpath '/opt/perfor_test/tpch-dbgen-master/orders.tbl' overwrite into table orders; load data local inpath '/opt/perfor_test/tpch-dbgen-master/partsupp.tbl' overwrite into table partsupp; load data local inpath '/opt/perfor_test/tpch-dbgen-master/part.tbl' overwrite into table part; load data local inpath '/opt/perfor_test/tpch-dbgen-master/region.tbl' overwrite into table region; load data local inpath '/opt/perfor_test/tpch-dbgen-master/supplier.tbl' overwrite into table supplier;MySQL
LOAD DATA LOCAL INFILE 'D:\customer.tbl' INTO TABLE customer FIELDS TERMINATED BY '|'; LOAD DATA LOCAL INFILE 'D:\lineitem.tbl' INTO TABLE lineitem FIELDS TERMINATED BY '|'; LOAD DATA LOCAL INFILE 'D:\nation.tbl' INTO TABLE nation FIELDS TERMINATED BY '|'; LOAD DATA LOCAL INFILE 'D:\orders.tbl' INTO TABLE orders FIELDS TERMINATED BY '|'; LOAD DATA LOCAL INFILE 'D:\partsupp.tbl' INTO TABLE partsupp FIELDS TERMINATED BY '|'; LOAD DATA LOCAL INFILE 'D:\part.tbl' INTO TABLE part FIELDS TERMINATED BY '|'; LOAD DATA LOCAL INFILE 'D:\region.tbl' INTO TABLE region FIELDS TERMINATED BY '|'; LOAD DATA LOCAL INFILE 'D:\supplier.tbl' INTO TABLE supplier FIELDS TERMINATED BY '|';
标准测试SQL
这里的标准SQL可能不适用与所有数据库,请根据实际情况修改。
-- Q1 -- 开启向量加速引擎,并设置开关变量为on set laser.enable = on; select l_returnflag, l_linestatus, sum(l_quantity) as sum_qty, sum(l_extendedprice) as sum_base_price, sum(l_extendedprice * (1 - l_discount)) as sum_disc_price, sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge, avg(l_quantity) as avg_qty, avg(l_extendedprice) as avg_price, avg(l_discount) as avg_disc, count(*) as count_order from lineitem where l_shipdate <= date '1998-12-01' - interval '93 day' group by l_returnflag, l_linestatus order by l_returnflag, l_linestatus; -- Q2 -- 开启向量加速引擎,并设置开关变量为on set laser.enable = on; select s_acctbal, s_name, n_name, p_partkey, p_mfgr, s_address, s_phone, s_comment from part, supplier, partsupp, nation, region where p_partkey = ps_partkey and s_suppkey = ps_suppkey and p_size = 23 and p_type like '%STEEL' and s_nationkey = n_nationkey and n_regionkey = r_regionkey and r_name = 'EUROPE' and ps_supplycost = ( select min(ps_supplycost) from partsupp, supplier, nation, region where p_partkey = ps_partkey and s_suppkey = ps_suppkey and s_nationkey = n_nationkey and n_regionkey = r_regionkey and r_name = 'EUROPE' ) order by s_acctbal desc, n_name, s_name, p_partkey limit 100; -- Q3 -- 开启向量加速引擎,并设置开关变量为on set laser.enable = on; select l_orderkey, sum(l_extendedprice * (1 - l_discount)) as revenue, o_orderdate, o_shippriority from customer, orders, lineitem where c_mktsegment = 'MACHINERY' and c_custkey = o_custkey and l_orderkey = o_orderkey and o_orderdate < date '1995-03-24' and l_shipdate > date '1995-03-24' group by l_orderkey, o_orderdate, o_shippriority order by revenue desc, o_orderdate limit 10; -- Q4 -- 开启向量加速引擎,并设置开关变量为on set laser.enable = on; select o_orderpriority, count(*) as order_count from orders where o_orderdate >= date '1996-08-01' and o_orderdate < date '1996-08-01' + interval '3' month and exists ( select * from lineitem where l_orderkey = o_orderkey and l_commitdate < l_receiptdate ) group by o_orderpriority order by o_orderpriority; -- Q6 -- 开启向量加速引擎,并设置开关变量为on set laser.enable = on; select sum(l_extendedprice * l_discount) as revenue from lineitem where l_shipdate >= date '1994-01-01' and l_shipdate < date '1994-01-01' + interval '1' year and l_discount between 0.06 - 0.01 and 0.06 + 0.01 and l_quantity < 24; -- Q7 -- 开启向量加速引擎,并设置开关变量为on set laser.enable = on; select supp_nation, cust_nation, l_year, sum(volume) as revenue from ( select n1.n_name as supp_nation, n2.n_name as cust_nation, extract(year from l_shipdate) as l_year, l_extendedprice * (1 - l_discount) as volume from supplier, lineitem, orders, customer, nation n1, nation n2 where s_suppkey = l_suppkey and o_orderkey = l_orderkey and c_custkey = o_custkey and s_nationkey = n1.n_nationkey and c_nationkey = n2.n_nationkey and ( (n1.n_name = 'JORDAN' and n2.n_name = 'INDONESIA') or (n1.n_name = 'INDONESIA' and n2.n_name = 'JORDAN') ) and l_shipdate between date '1995-01-01' and date '1996-12-31' ) as shipping group by supp_nation, cust_nation, l_year order by supp_nation, cust_nation, l_year; -- Q8 -- 开启向量加速引擎,并设置开关变量为on set laser.enable = on; select o_year, sum(case when nation = 'INDONESIA' then volume else 0 end) / sum(volume) as mkt_share from ( select extract(year from o_orderdate) as o_year, l_extendedprice * (1 - l_discount) as volume, n2.n_name as nation from part, supplier, lineitem, orders, customer, nation n1, nation n2, region where p_partkey = l_partkey and s_suppkey = l_suppkey and l_orderkey = o_orderkey and o_custkey = c_custkey and c_nationkey = n1.n_nationkey and n1.n_regionkey = r_regionkey and r_name = 'ASIA' and s_nationkey = n2.n_nationkey and o_orderdate between date '1995-01-01' and date '1996-12-31' and p_type = 'STANDARD BRUSHED BRASS' ) as all_nations group by o_year order by o_year; -- Q9 -- 开启向量加速引擎,并设置开关变量为on set laser.enable = on; select nation, o_year, sum(amount) as sum_profit from ( select n_name as nation, extract(year from o_orderdate) as o_year, l_extendedprice * (1 - l_discount) - ps_supplycost * l_quantity as amount from part, supplier, lineitem, partsupp, orders, nation where s_suppkey = l_suppkey and ps_suppkey = l_suppkey and ps_partkey = l_partkey and p_partkey = l_partkey and o_orderkey = l_orderkey and s_nationkey = n_nationkey and p_name like '%chartreuse%' ) as profit group by nation, o_year order by nation, o_year desc; -- Q10 -- 开启向量加速引擎,并设置开关变量为on set laser.enable = on; select c_custkey, c_name, sum(l_extendedprice * (1 - l_discount)) as revenue, c_acctbal, n_name, c_address, c_phone, c_comment from customer, orders, lineitem, nation where c_custkey = o_custkey and l_orderkey = o_orderkey and o_orderdate >= date '1994-08-01' and o_orderdate < date '1994-08-01' + interval '3' month and l_returnflag = 'R' and c_nationkey = n_nationkey group by c_custkey, c_name, c_acctbal, c_phone, n_name, c_address, c_comment order by revenue desc limit 20; -- Q11 -- 开启向量加速引擎,并设置开关变量为on set laser.enable = on; select ps_partkey, sum(ps_supplycost * ps_availqty) as value from partsupp, supplier, nation where ps_suppkey = s_suppkey and s_nationkey = n_nationkey and n_name = 'INDONESIA' group by ps_partkey having sum(ps_supplycost * ps_availqty) > ( select sum(ps_supplycost * ps_availqty) * 0.0001000000 from partsupp, supplier, nation where ps_suppkey = s_suppkey and s_nationkey = n_nationkey and n_name = 'INDONESIA' ) order by value desc; -- Q12 -- 开启向量加速引擎,并设置开关变量为on set laser.enable = on; select l_shipmode, sum(case when o_orderpriority = '1-URGENT' or o_orderpriority = '2-HIGH' then 1 else 0 end) as high_line_count, sum(case when o_orderpriority <> '1-URGENT' and o_orderpriority <> '2-HIGH' then 1 else 0 end) as low_line_count from orders, lineitem where o_orderkey = l_orderkey and l_shipmode in ('REG AIR', 'TRUCK') and l_commitdate < l_receiptdate and l_shipdate < l_commitdate and l_receiptdate >= date '1994-01-01' and l_receiptdate < date '1994-01-01' + interval '1' year group by l_shipmode order by l_shipmode; -- Q13 -- 开启向量加速引擎,并设置开关变量为on set laser.enable = on; select c_count, count(*) as custdist from ( select c_custkey, count(o_orderkey) from customer left outer join orders on c_custkey = o_custkey and o_comment not like '%pending%requests%' group by c_custkey ) as c_orders (c_custkey, c_count) group by c_count order by custdist desc, c_count desc; -- Q14 -- 开启向量加速引擎,并设置开关变量为on set laser.enable = on; select 100.00 * sum(case when p_type like 'PROMO%' then l_extendedprice * (1 - l_discount) else 0 end) / sum(l_extendedprice * (1 - l_discount)) as promo_revenue from lineitem, part where l_partkey = p_partkey and l_shipdate >= date '1994-11-01' and l_shipdate < date '1994-11-01' + interval '1' month; -- Q15 -- 开启向量加速引擎,并设置开关变量为on set laser.enable = on; create view revenue0 (supplier_no, total_revenue) as select l_suppkey, sum(l_extendedprice * (1 - l_discount)) from lineitem where l_shipdate >= date '1997-10-01' and l_shipdate < date '1997-10-01' + interval '3' month group by l_suppkey; select s_suppkey, s_name, s_address, s_phone, total_revenue from supplier, revenue0 where s_suppkey = supplier_no and total_revenue = ( select max(total_revenue) from revenue0 ) order by s_suppkey; drop view revenue0; -- Q16 -- 开启向量加速引擎,并设置开关变量为on set laser.enable = on; select p_brand, p_type, p_size, count(distinct ps_suppkey) as supplier_cnt from partsupp, part where p_partkey = ps_partkey and p_brand <> 'Brand#44' and p_type not like 'SMALL BURNISHED%' and p_size in (36, 27, 34, 45, 11, 6, 25, 16) and ps_suppkey not in ( select s_suppkey from supplier where s_comment like '%Customer%Complaints%' ) group by p_brand, p_type, p_size order by supplier_cnt desc, p_brand, p_type, p_size; -- Q17 -- 开启向量加速引擎,并设置开关变量为on set laser.enable = on; select sum(l_extendedprice) / 7.0 as avg_yearly from lineitem, part where p_partkey = l_partkey and p_brand = 'Brand#42' and p_container = 'JUMBO PACK' and l_quantity < ( select 0.2 * avg(l_quantity) from lineitem where l_partkey = p_partkey ); -- Q18 -- 开启向量加速引擎,并设置开关变量为on set laser.enable = on; select c_name, c_custkey, o_orderkey, o_orderdate, o_totalprice, sum(l_quantity) from customer, orders, lineitem where o_orderkey in ( select l_orderkey from lineitem group by l_orderkey having sum(l_quantity) > 312 ) and c_custkey = o_custkey and o_orderkey = l_orderkey group by c_name, c_custkey, o_orderkey, o_orderdate, o_totalprice order by o_totalprice desc, o_orderdate limit 100; -- Q19 -- 开启向量加速引擎,并设置开关变量为on set laser.enable = on; select sum(l_extendedprice* (1 - l_discount)) as revenue from lineitem, part where ( p_partkey = l_partkey and p_brand = 'Brand#43' and p_container in ('SM CASE', 'SM BOX', 'SM PACK', 'SM PKG') and l_quantity >= 5 and l_quantity <= 5 + 10 and p_size between 1 and 5 and l_shipmode in ('AIR', 'AIR REG') and l_shipinstruct = 'DELIVER IN PERSON' ) or ( p_partkey = l_partkey and p_brand = 'Brand#45' and p_container in ('MED BAG', 'MED BOX', 'MED PKG', 'MED PACK') and l_quantity >= 12 and l_quantity <= 12 + 10 and p_size between 1 and 10 and l_shipmode in ('AIR', 'AIR REG') and l_shipinstruct = 'DELIVER IN PERSON' ) or ( p_partkey = l_partkey and p_brand = 'Brand#11' and p_container in ('LG CASE', 'LG BOX', 'LG PACK', 'LG PKG') and l_quantity >= 24 and l_quantity <= 24 + 10 and p_size between 1 and 15 and l_shipmode in ('AIR', 'AIR REG') and l_shipinstruct = 'DELIVER IN PERSON' ); -- Q20 -- 开启向量加速引擎,并设置开关变量为on set laser.enable = on; select s_name, s_address from supplier, nation where s_suppkey in ( select ps_suppkey from partsupp where ps_partkey in ( select p_partkey from part where p_name like 'magenta%' ) and ps_availqty > ( select 0.5 * sum(l_quantity) from lineitem where l_partkey = ps_partkey and l_suppkey = ps_suppkey and l_shipdate >= date '1996-01-01' and l_shipdate < date '1996-01-01' + interval '1' year ) ) and s_nationkey = n_nationkey and n_name = 'RUSSIA' order by s_name; -- Q21 -- 开启向量加速引擎,并设置开关变量为on set laser.enable = on; select s_name, count(*) as numwait from supplier, lineitem l1, orders, nation where s_suppkey = l1.l_suppkey and o_orderkey = l1.l_orderkey and o_orderstatus = 'F' and l1.l_receiptdate > l1.l_commitdate and exists ( select * from lineitem l2 where l2.l_orderkey = l1.l_orderkey and l2.l_suppkey <> l1.l_suppkey ) and not exists ( select * from lineitem l3 where l3.l_orderkey = l1.l_orderkey and l3.l_suppkey <> l1.l_suppkey and l3.l_receiptdate > l3.l_commitdate ) and s_nationkey = n_nationkey and n_name = 'MOZAMBIQUE' group by s_name order by numwait desc, s_name limit 100; -- Q22 -- 开启向量加速引擎,并设置开关变量为on set laser.enable = on; select cntrycode, count(*) as numcust, sum(c_acctbal) as totacctbal from ( select substring(c_phone from 1 for 2) as cntrycode, c_acctbal from customer where substring(c_phone from 1 for 2) in ('13', '31', '23', '29', '30', '18', '17') and c_acctbal > ( select avg(c_acctbal) from customer where c_acctbal > 0.00 and substring(c_phone from 1 for 2) in ('13', '31', '23', '29', '30', '18', '17') ) and not exists ( select * from orders where o_custkey = c_custkey ) ) as custsale group by cntrycode order by cntrycode;参考
阿里基于TPC-H测试说明1阿里基于TPC-H测试说明2
欢迎分享,转载请注明来源:内存溢出
评论列表(0条)