- 一、hive 习题19
- 1、列出至少有一个员工的所有部门
- 2、列出薪金比“SMITH”多的所有员工
- 3、列出所有员工的姓名及其直接上级的姓名
- 4、列出受雇日期早于其直接上级的所有员工
- 5、列出部门名称和这些部门的员工信息,同时列出那些没有员工的部门
- 6、列出所有“CLERK”(办事员)的姓名及其部门名称
- 7、 列出最低薪金大于1500的各种工作
- 8、列出在部门“SALES”(销售部)工作的员工的姓名,不知道销售部的部门编号
- 9、列出薪金高于公司平均薪金的所有员工
- 10、列出与“SCOTT”从事相同工作的所有员工
- 11、列出薪金等于部门30中员工的薪金的所有员工的姓名和薪金
- 12、列出薪金高于在部门30工作的所有员工的薪金的员工姓名和薪金
- 13、列出在每个部门工作的员工数量、平均工资和平均服务期限
- 14、列出所有员工的姓名、部门名称和工资
- 15、列出所有部门的详细信息和部门人数
- 16、列出各种工作的最低工资
- 17、列出各个部门的MANAGER(经理)的最低薪金
- 18、列出所有员工的年工资,按年薪从低到高排序
- 19、列出每个部门薪水前两名最高的人员名称以及薪水
- 二、其他?
- 1、连续登陆天数,等等
- 总结
I know, i know
地球另一端有你陪我
一、hive 习题19
员工信息表emp:
字段:员工id,员工名字,工作岗位,部门经理,受雇日期,薪水,奖金,部门编号
英文名:EMPNO,ENAME,JOB,MGR,HIREDATE,SAL,BONUS,DEPTNO
create table emp( EMPNO int ,ENAME string ,JOB string ,MGR int ,HIREDATE string ,SAL int ,BonUS int ,DEPTNO int ) row format delimited fields terminated by ',';
7369,SMITH,CLERK,7902,1980-12-17,800,null,20 7499,ALLEN,SALESMAN,7698,1981-02-20,1600,300,30 7521,WARD,SALESMAN,7698,1981-02-22,1250,500,30 7566,JONES,MANAGER,7839,1981-04-02,2975,null,20, 7654,MARTIN,SALESMAN,7698,1981-09-28,1250,1400,30 7698,BLAKE,MANAGER,7839,1981-05-01,2850,null,30 7782,CLARK,MANAGER,7839,1981-06-09,2450,null,10 7788,SCOTT,ANALYST,7566,1987-04-19,3000,null,20 7839,KING,PRESIDENT,null,1981-11-17,5000,null,10 7844,TURNER,SALESMAN,7698,1981-09-08,1500,0,30 7876,ADAMS,CLERK,7788,1987-05-23,1100,null,20 7900,JAMES,CLERK,7698,1981-12-03,950,null,30 7902,FORD,ANALYST,7566,1981-12-03,3000,null,20 7934,MILLER,CLERK,7782,1982-01-23,1300,null,10
部门信息表dept:
字段:员工id,员工名字,工作岗位,部门经理,受雇日期,薪水,奖金,部门编号
英文名:EMPNO,ENAME,JOB,MGR,HIREDATE,SAL,BONUS,DEPTNO‘
create table dept( DEPTNO int ,DEPTNAME string ,DEPTADDR string ) row format delimited fields terminated by ',';
10,ACCOUNTING,NEW YORK 10,ACCOUNTING,shanghai 20,RESEARCH,DALLAS 30,SALES,CHICAGO 40,OPERATIONS,BOSTON1、列出至少有一个员工的所有部门
select distinct dept.deptname from emp inner join dept on emp.deptno=dept.deptno;2、列出薪金比“SMITH”多的所有员工
select t1.ename,t1.sal from emp as t1 inner join ( select sal from emp where ename='SMITH') as t2 where t1.sal>t2.sal;3、列出所有员工的姓名及其直接上级的姓名
select emp1.ename, emp1.empno, emp1.mgr as upno, emp2.ename as upname from emp as emp1 left join emp as emp2 on emp1.mgr=emp2.empno;4、列出受雇日期早于其直接上级的所有员工
select emp1.ename, emp1.empno, emp1.mgr as upno, emp2.ename as upname from emp as emp1 left join emp as emp2 on emp1.mgr=emp2.empno where emp1.hiredate5、列出部门名称和这些部门的员工信息,同时列出那些没有员工的部门 select dept.deptno ,dept.deptname ,emp.ename ,emp.job from emp right join dept on emp.deptno=dept.deptno order by deptno;6、列出所有“CLERK”(办事员)的姓名及其部门名称select t1.ename,t1.job,dept.deptno,dept.deptname from ( select emp.ename,emp.deptno,emp.job from emp where emp.job='CLERK') as t1 inner join dept on t1.deptno=dept.deptno;7、 列出最低薪金大于1500的各种工作select job,sal from emp where sal>1500;8、列出在部门“SALES”(销售部)工作的员工的姓名,不知道销售部的部门编号with t1 as (select deptno from dept where deptname='SALES') select ename from emp where deptno=t1;和第二题一样
9、列出薪金高于公司平均薪金的所有员工select t1.*,t1.avgsal from( select *,avg(sal) over() as avgsal from emp) as t1 where t1.sal>t1.avgsal;10、列出与“SCOTT”从事相同工作的所有员工select t1.* from emp as t1 inner join ( select job from emp where ename='SCOTT') as t2 where t1.job=t2.job and t1.ename<>'SCOTT';11、列出薪金等于部门30中员工的薪金的所有员工的姓名和薪金where in(...)12、列出薪金高于在部门30工作的所有员工的薪金的员工姓名和薪金select t1.ename,t1.sal from emp as t1 inner join ( select max(sal) as max from emp where deptno=30) as t2 where t1.sal>t2.max;13、列出在每个部门工作的员工数量、平均工资和平均服务期限select emp.deptno ,count(*) as number ,floor(avg(sal)) as avgsal ,floor(avg(datediff(current_date,hiredate))) as avgworkday from emp inner join dept on emp.deptno=dept.deptno group by emp.deptno;14、列出所有员工的姓名、部门名称和工资select emp.ename,dept.deptname,emp.sal from emp inner join dept on emp.deptno=dept.deptno;15、列出所有部门的详细信息和部门人数select dept.*,count(emp.ename) over(partition by dept.deptno) from dept left join emp on emp.deptno=dept.deptno;16、列出各种工作的最低工资select job,min(sal) from emp group by job;17、列出各个部门的MANAGER(经理)的最低薪金select t1.* from emp as t1 inner join ( select deptno,min(sal) as min from emp where job='MANAGER' group by deptno) as t2 on t1.deptno=t2.deptno and t1.sal=t2.min where t1.job='MANAGER';18、列出所有员工的年工资,按年薪从低到高排序select ename,sal*12+if(bonus is null,0,bonus) as money from emp order by money;19、列出每个部门薪水前两名最高的人员名称以及薪水with t as (select *,dense_rank() over(partition by deptno order by sal) as rank from emp) select ename,sal,deptno from t where rank<=3;
二、其他?
1、连续登陆天数,等等统计用户连续交易的总额、连续登陆天数、连续登陆开始和结束时间、间隔天数等
id datestr amount(金额) 1,2019-02-08,6214.23 1,2019-02-08,6247.32 1,2019-02-09,85.63 1,2019-02-09,967.36 1,2019-02-10,85.69 1,2019-02-12,769.85 1,2019-02-13,943.86 1,2019-02-14,538.42 1,2019-02-15,369.76 1,2019-02-16,369.76 1,2019-02-18,795.15 1,2019-02-19,715.65 1,2019-02-21,537.71 2,2019-02-08,6214.23 2,2019-02-08,6247.32 2,2019-02-09,85.63 2,2019-02-09,967.36 2,2019-02-10,85.69 2,2019-02-12,769.85 2,2019-02-13,943.86 2,2019-02-14,943.18 2,2019-02-15,369.76 2,2019-02-18,795.15 2,2019-02-19,715.65 2,2019-02-21,537.71 3,2019-02-08,6214.23 3,2019-02-08,6247.32 3,2019-02-09,85.63 3,2019-02-09,967.36 3,2019-02-10,85.69 3,2019-02-12,769.85 3,2019-02-13,943.86 3,2019-02-14,276.81 3,2019-02-15,369.76 3,2019-02-16,369.76 3,2019-02-18,795.15 3,2019-02-19,715.65 3,2019-02-21,537.711 先去重合并数据:t
select id ,datestr ,round(sum(amount)) as sum_amount from deal group by id,datestr;2 添加列作为基准:t1
select * ,row_number() over(partition by id order by datestr) as row1 from t;3 做个差,得到 origin 日期:t2
select * ,date_sub(datestr,row1) as origin from t1;4 分组,得到连续的天数和期间的连续消费:t3
select id ,count(origin) as continuity ,round(sum(sum_amount)) from t2 group by id,origin;5 追加,连续天数的起始和结束时间:t3
select id ,count(origin) as continuity ,round(sum(sum_amount)) ,min(datestr) ,max(datestr) from t2 group by id,origin;6 期间间隔天数,需要使用开窗帧,向下或向上取一个
LAG(col,n):往前第n行数据
LEAD(col,n):往后第n行数据select * ,datediff(min,lag(max,1) over(partition by id order by max)) as logoutdays from t3;整合一下,改改就行了
with t as (select id ,datestr ,round(sum(amount)) as sum_amount from deal group by id,datestr), t1 as (select * ,row_number() over(partition by id order by datestr) as row1 from t), t2 as (select * ,date_sub(datestr,row1) as origin from t1), t3 as(select id ,count(origin) as continuity ,round(sum(sum_amount)) ,min(datestr) as min ,max(datestr) as max from t2 group by id,origin) select * ,datediff(min,lag(max,1) over(partition by id order by max)) as logout from t3;
总结1、hive 里,不能在 where 中写查询子句
2、并列的函数字段,不能在同一级的查询语句中被引用
欢迎分享,转载请注明来源:内存溢出
评论列表(0条)