MySQL 合并查询union 查询出的行合并到一个表中

来源：cnblogs　　作者：酸菜鱼土豆大侠　　时间：2023/2/1 12:17:13　　对本文有异议

在合并查询中，尤其是二分类的情况，在查询结果是相同列名的时候可以考虑合并查询。先查询出行的结果，再使用union或者union all合并查询结果。

另外如果 union 和 order by 一起使用的话要注意使用方法。

一、适用场景和方法

（1）适用场景

考虑查询过程中是否存在以下情况：

查询行时用的表不同；
查询某些行时需要where条件，某些行不需要where条件；
分类查询；
查询的结果具有相同的列名。

存在上述情况时，大多数需要合并查询。先分行查询，再将查询出的行合并到一个表中。

（2）方法

MySQL合并查询，将查询到的行（具有相同列）合并到一个表中使用union或者union all函数

具体包括：

函数	使用说明
union	出现相同行时，不保留重复行，进行去重处理
union all	出现相同行时，保留重复行，不进行去重

根据查询需要使用不同合并函数。

二、案例分析

下面用2个具体的案例（由简到难）来说明行合并的过程：

（1）简单案例

案例来自：SQL26 计算25岁以上和以下的用户数量

描述

现在运营想要将用户划分为25岁以下和25岁及以上两个年龄段，分别查看这两个年龄段用户数量

本题注意：age为null 也记为 25岁以下

示例：user_profile

id	device_id	gender	age	university	gpa	active_days_within_30	question_cnt	answer_cnt
1	2138	male	21	北京大学	3.4	7	2	12
2	3214	male		复旦大学	4	15	5	25
3	6543	female	20	北京大学	3.2	12	3	30
4	2315	female	23	浙江大学	3.6	5	1	2
5	5432	male	25	山东大学	3.8	20	15	70
6	2131	male	28	山东大学	3.3	15	7	13
7	4321	male	26	复旦大学	3.6	9	6	52

根据示例，你的查询应返回以下结果：

age_cut	number
25岁以下	4
25岁及以上	3

【分类】：合并查询、多表连接

分析思路

难点：

1.单个字符或者值可以作为一列：例如'activity2' as activity

2.用了一半时间就完成高难度试卷。两个时间相减得到分钟：timestampdiff(minute, date_expr1, date_expr2) 两个时间的差

（1）统计25岁以下学生的人数

? [条件]：where score >= 85 and year(start_time) = 2021

? [使用]：distinct。一定要去重

（2）统计25岁以上学生的人数

? [条件]：where difficulty = 'hard' and score > 80 and year(start_time) = 2021 and timestampdiff(minute, start_time, submit_time) < duration / 2

? [使用]：多表连接使用 join using( )

（3）合并两个表

? [使用]：union all 和union 都可以，因为列activity不会有重复。

最终结果

(
select 查询结果 [年龄段; 人数]
from 从哪张表中查询数据[用户表]
where 查询条件 [年龄小于25或者为空]
)
union 
(
select 查询结果 [年龄段; 人数]
from 从哪张表中查询数据[用户表]
where 查询条件 [年龄大于25]
)

该题的多种解法详见：SQL26 计算25岁以上和以下的用户数量

求解代码

union

(
#统计25岁以下学生的人数
select
    '25岁以下' as age_cut,
    count(device_id) as number
from user_profile
where age < 25 or age is null
)
union
(
#统计25岁以上学生的人数
select
    '25岁及以上' as age_cut,
    COUNT(device_id) as number
from user_profile
where age >= 25
)

（2）较难案例

案例来自：SQL132 每个题目和每份试卷被作答的人数和次数

描述

现有试卷作答记录表exam_record（uid用户ID, exam_id试卷ID, start_time开始作答时间, submit_time交卷时间, score得分）：

id	uid	exam_id	start_time	submit_time	score
1	1001	9001	2021-09-01 09:01:01	2021-09-01 09:41:01	81
2	1002	9002	2021-09-01 12:01:01	2021-09-01 12:31:01	70
3	1002	9001	2021-09-01 19:01:01	2021-09-01 19:40:01	80
4	1002	9002	2021-09-01 12:01:01	2021-09-01 12:31:01	70
5	1004	9001	2021-09-01 19:01:01	2021-09-01 19:40:01	85
6	1002	9002	2021-09-01 12:01:01	(NULL)	(NULL)

题目练习表practice_record（uid用户ID, question_id题目ID, submit_time提交时间, score得分）：

id	uid	question_id	submit_time	score
1	1001	8001	2021-08-02 11:41:01	60
2	1002	8001	2021-09-02 19:30:01	50
3	1002	8001	2021-09-02 19:20:01	70
4	1002	8002	2021-09-02 19:38:01	70
5	1003	8001	2021-08-02 19:38:01	70
6	1003	8001	2021-08-02 19:48:01	90
7	1003	8002	2021-08-01 19:38:01	80

请统计每个题目和每份试卷被作答的人数和次数，分别按照"试卷"和"题目"的uv & pv降序显示，示例数据结果输出如下：

tid	uv	pv
9001	3	3
9002	1	3
8001	3	5
8002	2	2

解释：“试卷”有3人共练习3次试卷9001，1人作答3次9002；“刷题”有3人刷5次8001，有2人刷2次8002

【分类】：合并查询

分析思路

难点：

union 和 order by 一起使用需要注意的问题

（1）统计每份试卷被作答的人数和次数

? [条件]：where score >= 85 and year(start_time) = 2021

? [使用]：distinct。一定要去重

（2）统计每个题目被作答的人数和次数

? [条件]：where difficulty = 'hard' and score > 80 and year(start_time) = 2021 and timestampdiff(minute, start_time, submit_time) < duration / 2

? [使用]：多表连接使用 join using( )

（3）合并两个表,分别按照"试卷"和"题目"的uv & pv降序显示

? [使用]：union all 和union 都可以，因为列activity不会有重复。

最终结果

select * from 
(
select 查询结果 [试卷ID; 作答次数]
from 从哪张表中查询数据[试卷作答记录表]
group by 分组条件 [试卷ID]
order by 对查询结果排序 [按照"试卷"的uv & pv降序]
)
union
select * from 
(
select 查询结果 [题目ID; 作答次数]
from 从哪张表中查询数据[题目练习表]
group by 分组条件 [题目ID]
order by 对查询结果排序 [按照"题目"的uv & pv降序]
)

求解代码

方法一：

#正确代码
select * from 
(
select 
    exam_id as tid,
    count(distinct uid) as uv,
    count(uid) as pv
from exam_record a
group by exam_id
order by uv desc, pv desc
) a
union
select * from 
(
select 
    question_id as tid,
    count(distinct uid) as uv,
    count(uid) as pv
from practice_record b
group by question_id
order by uv desc, pv desc
) attr

是不是可以union两个子句之后再使用order by ? 但是这个排序要对2个表分别进行降序，就需要写成下面这样：

方法二：

使用函数

left(str,length) 函数： str左边开始的长度为 length 的子字符串，在本例中为‘9’和‘8’

解释：试卷编号以‘9’开头、题目编号以‘8’开头，对编号进行降序就是对"试卷"和"题目"分别进行排序

(
#每份试卷被作答的人数和次数
select
    exam_id as tid,
    count(distinct uid) as uv,
    count(*) as pv
from exam_record
group by exam_id
)
union
(
#每个题目被作答的人数和次数
select
    question_id as tid,
    count(distinct uid) as uv,
    count(*) as pv
from practice_record
group by question_id
)
#分别按照"试卷"和"题目"的uv & pv降序显示
order by left(tid,1) desc,uv desc,pv desc