day 48 hive 一些习题

day 48 hive 一些习题,第1张

day 48 hive 一些习题

文章目录
  • 一、hive 习题19
      • 1、列出至少有一个员工的所有部门
      • 2、列出薪金比“SMITH”多的所有员工
      • 3、列出所有员工的姓名及其直接上级的姓名
      • 4、列出受雇日期早于其直接上级的所有员工
      • 5、列出部门名称和这些部门的员工信息,同时列出那些没有员工的部门
      • 6、列出所有“CLERK”(办事员)的姓名及其部门名称
      • 7、 列出最低薪金大于1500的各种工作
      • 8、列出在部门“SALES”(销售部)工作的员工的姓名,不知道销售部的部门编号
      • 9、列出薪金高于公司平均薪金的所有员工
      • 10、列出与“SCOTT”从事相同工作的所有员工
      • 11、列出薪金等于部门30中员工的薪金的所有员工的姓名和薪金
      • 12、列出薪金高于在部门30工作的所有员工的薪金的员工姓名和薪金
      • 13、列出在每个部门工作的员工数量、平均工资和平均服务期限
      • 14、列出所有员工的姓名、部门名称和工资
      • 15、列出所有部门的详细信息和部门人数
      • 16、列出各种工作的最低工资
      • 17、列出各个部门的MANAGER(经理)的最低薪金
      • 18、列出所有员工的年工资,按年薪从低到高排序
      • 19、列出每个部门薪水前两名最高的人员名称以及薪水
  • 二、其他?
      • 1、连续登陆天数,等等
  • 总结


I know, i know
地球另一端有你陪我




一、hive 习题19

员工信息表emp:

字段:员工id,员工名字,工作岗位,部门经理,受雇日期,薪水,奖金,部门编号
英文名:EMPNO,ENAME,JOB,MGR,HIREDATE,SAL,BONUS,DEPTNO

create table emp(
    EMPNO int
    ,ENAME string
    ,JOB string
    ,MGR int
    ,HIREDATE string
    ,SAL int
    ,BonUS int
    ,DEPTNO int
) 
row format delimited
fields terminated by ',';
7369,SMITH,CLERK,7902,1980-12-17,800,null,20
7499,ALLEN,SALESMAN,7698,1981-02-20,1600,300,30
7521,WARD,SALESMAN,7698,1981-02-22,1250,500,30
7566,JONES,MANAGER,7839,1981-04-02,2975,null,20,
7654,MARTIN,SALESMAN,7698,1981-09-28,1250,1400,30
7698,BLAKE,MANAGER,7839,1981-05-01,2850,null,30
7782,CLARK,MANAGER,7839,1981-06-09,2450,null,10
7788,SCOTT,ANALYST,7566,1987-04-19,3000,null,20
7839,KING,PRESIDENT,null,1981-11-17,5000,null,10
7844,TURNER,SALESMAN,7698,1981-09-08,1500,0,30
7876,ADAMS,CLERK,7788,1987-05-23,1100,null,20
7900,JAMES,CLERK,7698,1981-12-03,950,null,30
7902,FORD,ANALYST,7566,1981-12-03,3000,null,20
7934,MILLER,CLERK,7782,1982-01-23,1300,null,10

部门信息表dept:

字段:员工id,员工名字,工作岗位,部门经理,受雇日期,薪水,奖金,部门编号
英文名:EMPNO,ENAME,JOB,MGR,HIREDATE,SAL,BONUS,DEPTNO‘

create table dept(
    DEPTNO int
    ,DEPTNAME string
    ,DEPTADDR string
) 
row format delimited
fields terminated by ',';
10,ACCOUNTING,NEW YORK
10,ACCOUNTING,shanghai
20,RESEARCH,DALLAS
30,SALES,CHICAGO
40,OPERATIONS,BOSTON
1、列出至少有一个员工的所有部门
select distinct dept.deptname from emp
inner join dept
on emp.deptno=dept.deptno;
2、列出薪金比“SMITH”多的所有员工
select t1.ename,t1.sal from emp as t1
inner join (
select sal from emp where ename='SMITH') as t2
where t1.sal>t2.sal;
3、列出所有员工的姓名及其直接上级的姓名
select 
emp1.ename,
emp1.empno,
emp1.mgr as upno,
emp2.ename as upname
from emp as emp1
left join emp as emp2
on emp1.mgr=emp2.empno;
4、列出受雇日期早于其直接上级的所有员工
select 
emp1.ename,
emp1.empno,
emp1.mgr as upno,
emp2.ename as upname
from emp as emp1
left join emp as emp2
on emp1.mgr=emp2.empno
where emp1.hiredate 
5、列出部门名称和这些部门的员工信息,同时列出那些没有员工的部门 
select
dept.deptno
,dept.deptname
,emp.ename
,emp.job
from emp
right join dept
on emp.deptno=dept.deptno
order by deptno;
6、列出所有“CLERK”(办事员)的姓名及其部门名称
select t1.ename,t1.job,dept.deptno,dept.deptname from (
select emp.ename,emp.deptno,emp.job from emp
where emp.job='CLERK') as t1
inner join dept
on t1.deptno=dept.deptno;
7、 列出最低薪金大于1500的各种工作
select job,sal from emp
where sal>1500;
8、列出在部门“SALES”(销售部)工作的员工的姓名,不知道销售部的部门编号
with t1 as (select deptno from dept where deptname='SALES')
select ename from emp where deptno=t1;

和第二题一样

9、列出薪金高于公司平均薪金的所有员工
select t1.*,t1.avgsal from(
select *,avg(sal) over() as avgsal from emp) as t1
where t1.sal>t1.avgsal;
10、列出与“SCOTT”从事相同工作的所有员工
select t1.* from emp as t1
inner join (
select job from emp
where ename='SCOTT') as t2
where t1.job=t2.job and t1.ename<>'SCOTT';
11、列出薪金等于部门30中员工的薪金的所有员工的姓名和薪金
where in(...)
12、列出薪金高于在部门30工作的所有员工的薪金的员工姓名和薪金
select t1.ename,t1.sal from emp as t1
inner join (
select max(sal) as max from emp
where deptno=30) as t2
where t1.sal>t2.max;
13、列出在每个部门工作的员工数量、平均工资和平均服务期限
select
emp.deptno
,count(*) as number
,floor(avg(sal)) as avgsal
,floor(avg(datediff(current_date,hiredate))) as avgworkday
from emp
inner join dept
on emp.deptno=dept.deptno
group by emp.deptno;
14、列出所有员工的姓名、部门名称和工资
select emp.ename,dept.deptname,emp.sal from emp
inner join dept
on emp.deptno=dept.deptno;
15、列出所有部门的详细信息和部门人数
select dept.*,count(emp.ename) over(partition by dept.deptno) from dept
left join emp
on emp.deptno=dept.deptno;
16、列出各种工作的最低工资
select
job,min(sal)
from emp
group by job;
17、列出各个部门的MANAGER(经理)的最低薪金
select t1.* from emp as t1
inner join (
select deptno,min(sal) as min from emp
where job='MANAGER'
group by deptno) as t2
on t1.deptno=t2.deptno and t1.sal=t2.min
where t1.job='MANAGER';
18、列出所有员工的年工资,按年薪从低到高排序
select ename,sal*12+if(bonus is null,0,bonus) as money
from emp
order by money;
19、列出每个部门薪水前两名最高的人员名称以及薪水
with t as (select *,dense_rank()
over(partition by deptno order by sal) as rank from emp)

select ename,sal,deptno from t where rank<=3;

二、其他?
1、连续登陆天数,等等

统计用户连续交易的总额、连续登陆天数、连续登陆开始和结束时间、间隔天数等

id	datestr	  amount(金额)
1,2019-02-08,6214.23 
1,2019-02-08,6247.32 
1,2019-02-09,85.63 
1,2019-02-09,967.36 
1,2019-02-10,85.69 
1,2019-02-12,769.85 
1,2019-02-13,943.86 
1,2019-02-14,538.42
1,2019-02-15,369.76
1,2019-02-16,369.76
1,2019-02-18,795.15
1,2019-02-19,715.65
1,2019-02-21,537.71
2,2019-02-08,6214.23 
2,2019-02-08,6247.32 
2,2019-02-09,85.63 
2,2019-02-09,967.36 
2,2019-02-10,85.69 
2,2019-02-12,769.85 
2,2019-02-13,943.86 
2,2019-02-14,943.18
2,2019-02-15,369.76
2,2019-02-18,795.15
2,2019-02-19,715.65
2,2019-02-21,537.71
3,2019-02-08,6214.23 
3,2019-02-08,6247.32 
3,2019-02-09,85.63 
3,2019-02-09,967.36 
3,2019-02-10,85.69 
3,2019-02-12,769.85 
3,2019-02-13,943.86 
3,2019-02-14,276.81
3,2019-02-15,369.76
3,2019-02-16,369.76
3,2019-02-18,795.15
3,2019-02-19,715.65
3,2019-02-21,537.71

1 先去重合并数据:t

select
    id
    ,datestr
    ,round(sum(amount)) as sum_amount
from deal group by id,datestr;

2 添加列作为基准:t1

select
    *
    ,row_number() over(partition by id order by datestr) as row1
from t;

3 做个差,得到 origin 日期:t2

select
    *
    ,date_sub(datestr,row1) as origin
from t1;

4 分组,得到连续的天数和期间的连续消费:t3

select
    id
    ,count(origin) as continuity
    ,round(sum(sum_amount))
from t2 group by id,origin;

5 追加,连续天数的起始和结束时间:t3

select
    id
    ,count(origin) as continuity
    ,round(sum(sum_amount))
    ,min(datestr)
    ,max(datestr)
from t2 group by id,origin;

6 期间间隔天数,需要使用开窗帧,向下或向上取一个

LAG(col,n):往前第n行数据
LEAD(col,n):往后第n行数据

select
*
,datediff(min,lag(max,1) over(partition by id order by max)) as logoutdays
from t3;

整合一下,改改就行了

with t as
(select
    id
    ,datestr
    ,round(sum(amount)) as sum_amount
from deal group by id,datestr),

t1 as
(select 
    *
    ,row_number() over(partition by id order by datestr) as row1
from t),

t2 as
(select 
    *
    ,date_sub(datestr,row1) as origin 
from t1),

t3 as(select id
    ,count(origin) as continuity
    ,round(sum(sum_amount))
    ,min(datestr) as min
    ,max(datestr) as max
from t2 group by id,origin)

select
*
,datediff(min,lag(max,1) over(partition by id order by max)) as logout
from t3;

总结

1、hive 里,不能在 where 中写查询子句

2、并列的函数字段,不能在同一级的查询语句中被引用

欢迎分享,转载请注明来源:内存溢出

原文地址: http://outofmemory.cn/zaji/5610673.html

(0)
打赏 微信扫一扫 微信扫一扫 支付宝扫一扫 支付宝扫一扫
上一篇 2022-12-15
下一篇 2022-12-15

发表评论

登录后才能评论

评论列表(0条)

保存