spark 窗口函数对多列数据进行排名示例-CSDN博客

阿里云国内75折回扣微信号：monov8

阿里云国际，腾讯云国际，低至75折。AWS 93折免费开户实名账号代冲值优惠多多微信号：monov8 飞机：@monov6

如果我们要select 同学的 id语文成绩语文成绩排名数学成绩数学成绩排名英语成绩英语成绩排名。
可以使用以窗口函数

## 创建表
create table t_window(id string, chinese int, math int, english int);

## 插入数据
insert into t_window 
values
('1', 99, 88, 77),
('2', 77, 99, 88), 
('3', 88, 77, 99);

## 检索数据, 最后的结果按学号排序
select id, 
chinese, row_number() over(order by chinese desc) as chinese_order,
math, row_number() over(order by math desc) as math_order ,
english, row_number() over(order by english desc) as english_order  from t_window order by id;

结果如下

学号	语文成绩	语文成绩排名	数学	数学成绩排名	英语	英语成绩排名
1	99	1	88	2	77	3
2	77	3	99	1	88	2
3	88	2	77	3	99	1

生成执行计划

== Physical Plan ==
*(4) Sort [id#156 ASC NULLS FIRST], true, 0
+- *(4) Project [id#156, chinese#157, chinese_order#145, math#158, math_order#146, english#159, english_order#147]
   +- Window [row_number() windowspecdefinition(english#159 DESC NULLS LAST, specifiedwindowframe(RowFrame, unboundedpreceding$(), currentrow$())) AS english_order#147], [english#159 DESC NULLS LAST]
      +- *(3) Sort [english#159 DESC NULLS LAST], false, 0
         +- Window [row_number() windowspecdefinition(math#158 DESC NULLS LAST, specifiedwindowframe(RowFrame, unboundedpreceding$(), currentrow$())) AS math_order#146], [math#158 DESC NULLS LAST]
            +- *(2) Sort [math#158 DESC NULLS LAST], false, 0
               +- Window [row_number() windowspecdefinition(chinese#157 DESC NULLS LAST, specifiedwindowframe(RowFrame, unboundedpreceding$(), currentrow$())) AS chinese_order#145], [chinese#157 DESC NULLS LAST]
                  +- *(1) Sort [chinese#157 DESC NULLS LAST], false, 0
                     +- Exchange SinglePartition, ENSURE_REQUIREMENTS, [id=#306]
                        +- Scan hive hzz.t_window [id#156, chinese#157, math#158, english#159], HiveTableRelation [`hzz`.`t_window`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, Data Cols: [id#156, chinese#157, math#158, english#159], Partition Cols: []]

最终的执行计划
先scan hzz.t_window。
Exchange SinglePartition 是把所有数据都放到一个分区因为要全排序没有partition by。
(1) Sort [chinese#157 DESC NULLS LAST] 对语文进行倒排序。
Window [row_number() windowspecdefinition(chinese#157… 生成语文的顺序
(2) Sort [math#158 DESC NULLS LAST] 对数学进行倒排序。
Window [row_number() windowspecdefinition(math#158 生成数学的顺序
Sort [english#159 DESC NULLS LAST]对英语进行倒排序。
Window [row_number() windowspecdefinition(english#159 生成英语的顺序
Project 选择出需要的列。
Sort [id#156 ASC NULLS FIRST] 对结果按学号进行全局排序。

阿里云国内75折回扣微信号：monov8

阿里云国际，腾讯云国际，低至75折。AWS 93折免费开户实名账号代冲值优惠多多微信号：monov8 飞机：@monov6

返回列表

上一篇：Docker Compose安装-CSDN博客

下一篇：CSS基础介绍2-CSDN博客

“spark 窗口函数对多列数据进行排名示例-CSDN博客” 的相关文章

大数据是干什么的，TOOM大数据舆情监测系统数据挖掘1年前 (2023-02-02)

使用Python库pyqt5制作TXT阅读器（一）-------UI设计1年前 (2023-02-02)

研发回家过年了，留下这个低代码开源平台真好用！1年前 (2023-02-02)

Web Spider NEX XX国际货币经纪 - PDF下载 & 提取关键词（二）1年前 (2023-02-02)

一文带你学会python新年倒计时1年前 (2023-02-02)

将IoTDB注册为Windows服务1年前 (2023-02-02)

【TypeScript】TS类型声明（二）_ts数组声明1年前 (2023-02-02)

ENSP网络综合实验1年前 (2023-02-02)

动态规划之背包问题（01背包问题、完全背包问题、多重背包问题 I、多重背包问题 II 、分组背包问题）1年前 (2023-02-02)

【Linux从入门到放弃】Linux编辑器——vim的使用1年前 (2023-02-02)

spark 窗口函数对多列数据进行排名示例-CSDN博客

“spark 窗口函数对多列数据进行排名示例-CSDN博客” 的相关文章

阿里云国际版