Hive累计报表

目录
  1. 1. 准备
    1. 1.1. 创建数据文件
    2. 1.2. 创建表及读取数据
  2. 2. 1、先求每个品牌的月总金额
  3. 3. 2、将月总金额自连接
  4. 4. 3、从上一步的结果中进行分组查询

[TOC]

在 hive 做统计的时候,总是涉及到做累计的报表处理,下面案列就是来做相应处理

准备

创建数据文件

(在hadoop所在的机器)

vim /usr/hadoop/hivedata/t_sales.dat

1
2
3
4
5
6
7
8
9
10
舒肤佳,2018-06,5
舒肤佳,2018-06,15
美姿,2018-06,5
舒肤佳,2018-06,8
美姿,2018-06,25
舒肤佳,2018-06,5
舒肤佳,2018-07,4
舒肤佳,2018-07,6
美姿,2018-07,10
美姿,2018-07,5

上传到hdfs

1
hadoop fs -put /usr/hadoop/hivedata/t_sales.dat /local/hivedata/t_sales.dat

创建表及读取数据

1
2
3
create table t_sales(brandname string,month string,sales int)
row format delimited fields terminated by ',';
load data inpath '/local/hivedata/t_sales.dat' into table t_sales;

如果是上传本地文件(如果在hive所在主机上) 则在load data 后加 local,如 load data local inpath '/usr/hadoop/hivedata/t_sales.dat' into table t_sales;

1、先求每个品牌的月总金额

1
select brandname,month,sum(sales) as all_sales from t_sales group by brandname,month

2、将月总金额自连接

1
2
3
4
5
6
7
select * from  (select brandname,month,sum(sales) as sal from t_sales group by brandname,month) A 
inner join
(select brandname,month,sum(sales) as sal from t_sales group by brandname,month) B
on
A.brandname=B.brandname
where
B.month <= A.month;

3、从上一步的结果中进行分组查询

分组的字段是 a.brandname a.month

求月累计值: 将 b.month <= a.month 的所有 b.sals求和即可

1
2
3
4
5
6
7
8
9
10
select A.brandname,A.month,max(A.sales) as sales,sum(B.sales) as accumulate
from
(select brandname,month,sum(sales) as sales from t_sales group by brandname,month) A
inner join
(select brandname,month,sum(sales) as sales from t_sales group by brandname,month) B
on
A.brandname=B.brandname
where B.month <= A.month
group by A.brandname,A.month
order by A.brandname,A.month;

评论