astak16 / blog-mysql Goto Github PK
View Code? Open in Web Editor NEWLeetCode 数据库刷题
LeetCode 数据库刷题
重新格式化表,是的新的表中有一个 id
列,和对应每个月的收入列
month
取值 ["Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec"]
create table department (
id int,
revenue int,
month varchar(255)
);
insert into department values
(1, 8000, 'Jan'),
(2, 9000, 'Jan'),
(3, 10000, 'Feb'),
(1, 7000, 'Feb'),
(1, 6000, 'Mar');
select id,
sum(case month when 'Jan' then revenue end) as 'Jan_Revenue',
sum(case month when 'Feb' then revenue end) as 'Feb_Revenue',
sum(case month when 'Mar' then revenue end) as 'Mar_Revenue',
sum(case month when 'Apr' then revenue end) as 'Apr_Revenue',
sum(case month when 'May' then revenue end) as 'May_Revenue',
sum(case month when 'Jun' then revenue end) as 'Jun_Revenue',
sum(case month when 'Jul' then revenue end) as 'Jul_Revenue',
sum(case month when 'Aug' then revenue end) as 'Aug_Revenue',
sum(case month when 'Sep' then revenue end) as 'Sep_Revenue',
sum(case month when 'Oct' then revenue end) as 'Oct_Revenue',
sum(case month when 'Nov' then revenue end) as 'Nov_Revenue',
sum(case month when 'Dec' then revenue end) as 'Dec_Revenue'
from department group by id
department
表中存储这所有人所有月的收入,这里的需求是将 department
的 month
列拆成具体的月份。具体实现:
department
按照 id
进行分组case month when 'Jan' then revenue end
计算出一月份的收入
if(month = 'Jan', revenue, null)
group by
需要使用聚合函数,这里的聚合函数可以用 max
、 min
、 sum
等编写一个 SQL 查询,获取 employee
表中第二高的薪水(salary)。
create table employee (
id int primary key auto_increment,
salary int
);
insert into employee (salary) values(100),(200),(300);
select e.salary from (
select
salary,
case when salary is not null then @rank:= @rank + 1 end as 排名
from (select @rank:=0) as init, (
select salary from employee group by salary order by salary desc
) as t
) as e where e.排名 = 2;
salary
进行分组并且降序@rank
计算出排名排名 = 2
的数据select salary from employee group by salary order by salary desc limit 1, 1;
排序后使用 limit
截取第二条数据
题目链接:换座位
改变相邻两同学的座位,如果学生人数是奇数,则最后一位同学不需要换座位。
create table seat (
id int,
student varchar(255)
);
insert into seat values
(1, 'Doris'),
(2, 'Abbot'),
(3, 'Green'),
(4, 'Emerson'),
(5, 'Jeames');
select
(
case
when mod(id, 2) != 0 and counts != id then id + 1
when mod(id, 2) != 0 and counts = id then id
else id - 1
end
) as id,
student
from seat,
(select count(*) as counts from seat) as count
order by id asc;
mod(n, m)
对 n
取余。counts
mod
函数对 id
取余
0
,说明是偶数,则 id - 1
0
,说明是奇数,并且 id
不等于 counts
,则 id + 1
0
,说明是奇数,并且 id
等于 counts
,则 id
不变select s.id,
if(mod(s.id, 2) = 1, s.back, s.ahead) as student
from (
select
id, student,
lag(student, 1, student) over() as ahead,
lead(student, 1, student) over() as back
from seat
) as s;
窗口函数 lag
从前往后取,跳过第一行, null
用 seat.student
来填充
id | student | ahead |
---|---|---|
1 | Doris | Doris |
2 | Abbot | Doris |
3 | Green | Abbot |
4 | Emerson | Green |
5 | Jeames | Emerson |
窗口函数 lead
从后往前取,跳过最后一行, null
用 seat.student
来填充
id | student | back |
---|---|---|
1 | Doris | Abbot |
2 | Abbot | Green |
3 | Green | Emerson |
4 | Emerson | Jeames |
5 | Jeames | Jeames |
再通过 mod
判断奇偶数,奇数取 back
,偶数取 ahead
。
create table employee(
id int,
month int,
salary int
)
insert into employee values
(1,1,20),
(2,1,20),
(1,2,30),
(2,2,30),
(3,2,40),
(1,3,40),
(3,3,60),
(1,4,60),
(3,4,70),
(1,7,90),
(1,8,90);
窗口函数的作用是用来简化排名问题,否则将会写一大堆晦涩难懂的代码。
窗口函数与聚合函数的区别是:
排名函数: rank()
、 row_number()
、 dense_rank()
,他们的用法可以看
分布函数: percent_rank()
、cume_dist()
偏移量函数: lead()
、lag()
一些其他函数:first_value()
、last_value()
、nth_value()
、ntile()
这些函数的详细介在这里:各种窗口函数
聚合函数也可以配合窗口函数使用,起到“累加/累计”的作用,截止到当前行的 和/最大值/最小值/平均值/个数
:
select *, sum(column) over(partition by id order by age) from table_name;
聚合函数和专用函数有一个显著的区别:聚合函数需要指定列名,专用函数不需要指定列名。
语法: rows n preceding between n following
,precding
在前, following
在后,否则会报错, current row
前后都可以
rows
:行号为基准range
:以 order by
为基准n preceding
: 前 n
行n following
:后 n
行current row
:当前行unbounded preceding
:窗口第一行unbounded following
:窗口最后一行n preceding
当前行的前 n
行
当 n = 1
salary
计算salary
计算select
id, month, salary,
sum(salary) over(partition by id order by month range 1 preceding) w_salary
from employee
n following
当前行的后 n
行,和 preceding
效果一样
如果单独使用 n following
会报语法错误
select
id, month, salary,
sum(salary) over(partition by id order by month range 1 following) w_salary
from employee
current row
当前行,只计算当前行,和 w_salary
和 salary
一样
select
id, month, salary,
sum(salary) over(partition by id order by month range current row) w_salary
from employee
unbounded preceding
窗口第一行salar
计算
select
id, month, salary,
sum(salary) over(partition by id
order by month
range unbounded preceding
) w_salary
from employee
unbounded following
:窗口最后一行,和 unbounded preceding
效果一样select
id, month, salary,
sum(salary) over(partition by id
order by month
range unbounded preceding
) w_salary
from employee
between 2 preceding and 2 following
salary
累加
select
id, month, salary,
sum(salary) over(partition by id
order by month
range between 2 preceding and 2 following
) w_salary
from employee
rows
和 range
的区别rows
是每一行在表中的实际位置,可以脱离 order by
运行w_salary
是将当前行和下面两行的 salary
进行计算。id = 1, month = 7, 8
没有和 id = 2, month = 1
的 salary
进行累加,是因为这里使用 partition by id
按 id
进行分组
select
id, month, salary,
sum(salary) over(partition by id
order by month
rows between 0 preceding and 2 following
) w_salary
from employee
range
要配合 order by
使用,如果 order by <column_name>
的 column_name
不连续,在计算时也不会跳过。id = 1, month = 3, 4
, w_salary
为 100, 60
,因为没有 month = 5
,所以它不会去计算 month = 7
。
select
id, month, salary,
sum(salary) over(partition by id
order by month
range between 0 preceding and 2 following
) w_salary
from employee
range
和 rows
没有区别:
0
行到下边界值都是一样
rows between 0 preceding and unbounded following
range between 0 preceding and unbounded following
rows between current row and unbounded following
range between current row and unbounded following
0
行值都是一样
rows between unbounded preceding and 0 following
range between unbounded preceding and 0 following
rows between unbounded preceding and current row
range between unbounded preceding and current row
查询每组玩家和日期,以及玩家到目前位置玩了多少游戏,也就是说,再此日期之前玩家所玩的游戏总数。
create table activity (
player_id int,
device_id int,
event_date date,
games_played int
);
insert into activity values
(1, 2, '2016-03-01', 5),
(1, 2, '2016-05-02', 6),
(1, 3, '2017-06-25', 1),
(3, 1, '2016-03-02', 0),
(3, 4, '2018-07-03', 5);
select player_id, event_date,
sum(
case
when @pre_player_id = player_id then @n:=@n+games_played
when @pre_player_id:= player_id then @n:=games_played
end
) as games_played_so_far
from (
select * from activity order by player_id, event_date
) temp, (select @pre_player_id:=null, @n:=0) init
group by player_id, event_date;
因为最终的结果是计算每个用户在某天玩游戏的次数,所以需要按照 player_id
和 event_date
分组。
因为 order by
执行的顺序在 sum
函数后面执行,所以这里需要先对 player_id
和 event_date
先进行排序。
具体实现:
activity
按照 player_id
和 event_date
升序排序,命名为 temp
临时表temp
按照 player_id
和 event_date
进行分组case ... when ... then ...end
语句对分组后的 temp
中 games_played
进行输出,并用 sum
求和select
player_id,
event_date,
sum(games_played)
over(partition by player_id order by event_date) as games_played_so_far
from activity;
在 sum
函数后面可以使用 over
对其按照 player_id
分组,并按照 event_date
排序。
select a.player_id, a.event_date, sum(b.games_played) as games_played_so_far
from activity a join activity b
on a.player_id = b.player_id where a.event_date >= b.event_date
group by a.player_id, a.event_date;
activity
自连,连接条件 a.player_id = b.player_id
筛选出 a.event_date >= b.event_date
a
表的 player_id
和 a.event_date
进行分组b.games_played
a.event_date >= b.event_date
也就是说在 a.event_date >= b.event_date
的数据中, a.games_played
都是一样的,是不对的。查询首次登录游戏并且第二天再次登录游戏玩家的比率,四舍五入到小数点后两位。
create table activity (
player_id int,
device_id int,
event_date date,
games_played int
);
insert into activity values
(1, 2, '2016-03-01', 5),
(1, 2, '2016-03-02', 6),
(1, 3, '2017-06-25', 1),
(3, 1, '2016-03-02', 0),
(3, 4, '2018-07-03', 5);
select round(select count(player_id) from (
select
player_id,
event_date,
lag(event_date)
over(partition by player_id order by event_date) as next_date,
rank() over(partition by player_id order by event_date) as login_times
from activity
) as temp where datediff(event_date, next_date) = 1 and login_times = 2
/
select count(player_id) from (
select player_id from activity group by player_id
) as temp, 2) as fraction;
需要求出两个值:第一个值是总人数,第二个值是连续登录两天的玩家数
将 activity
表按照 player_id
进行分组在计算出总人数
select count(player_id) from (
select player_id from activity group by player_id
) as temp
还有一种写法:
select count(distinct player_id) from activity
使用窗口函数 lag(event_date)
将日期按照 player_id
分组,并按照 event_date
升序排列,然后偏移一天:
lag(event_date) over(partition by player_id order by event_date) as next_date
使用 rank
对日期按照 player_id
分组,并按照 event_date
升序排列,然后排序:
rank() over(partition by player_id order by event_date) as login_times
组合成完整的 SQL
:
select
player_id,
event_date,
lag(event_date)
over(partition by player_id order by event_date) as next_date,
rank() over(partition by player_id order by event_date) as login_times
from activity
筛选出日期相差一天,并且是第一次连续登录,这里 login_times
取值 2
,因为上面按照登录日期进行排序过了,第一天登录是 1
,第二天登录是 2
。
计算出 player_id
的个数就可以算出连续登录两天的玩家了数了:
select count(player_id) from (
select
player_id,
event_date,
lag(event_date)
over(partition by player_id order by event_date) as next_date,
rank() over(partition by player_id order by event_date) as login_times
from activity
) as temp where datediff(event_date, next_date) = 1 and login_times = 2
使用 round()
保留两位小数
select round((
(select count(player_id) from (
select
player_id,
datediff(event_date, min(event_date) over(partition by player_id)) as diff
from activity
) as temp where diff = 1) / (select count(distinct player_id) from activity)
), 2) as fraction;
select
player_id,
datediff(event_date, min(event_date) over(partition by player_id)) as diff
from activity
diff
为 1
:select count(player_id) from (
select
player_id,
datediff(event_date, min(event_date) over(partition by player_id)) as diff
from activity
) as temp where diff = 1
round()
保留两位小数with temp as (
select
player_id,
datediff(event_date, min(event_date) over(partition by player_id)) as diff
from activity
) select round(
sum(case diff when 1 then 1 else 0 end) /
count(distinct player_id),
2) as fraction from temp;
temp
,用 with
diff = 1
的和是连续两天登录的用户:sum(case diff when 1 then 1 else 0 end)
temp
中记录了每个用户每次登录的时间差,去重计算 player_id
:count(distinct player_id)
round()
保留两位小数select round(avg(event_date is not null), 2) as fraction from (
select player_id, min(event_date) as first_login from activity
group by player_id
) temp left join activity
on temp.player_id = activity.player_id
and datediff(event_date, first_login) = 1;
min
算出每个用户第一次登录的日期:select player_id, min(event_date) as first_login from activity
group by player_id
activity
表:select * from (
select player_id, min(event_date) as first_login from activity
group by player_id
) temp left join activity
on temp.player_id = activity.player_id
and datediff(event_date, first_login) = 1;
temp
只有每个用户第一次登录游戏的时间,加上连接条件 datediff(event_date, first_login) = 1
后,不满足条件的用户数据都 null
avg(event_date is not null)
算出平均数
temp
表是每个用户第一次登录的时间activity
表是存储着所有数据temp
左连 activity
筛选出第一次连续两天登录的玩家,满足条件的玩家是有值的,没有满足条件的玩家是 null
event_date is not null
结果是 1
或者 0
,不是日期了,也没有 null
了。round()
计算出平均数create table employee(
id int,
month int,
salary int
);
insert into employee values
(1, 1, 20),
(2, 1, 20),
(1, 2, 30),
(2, 2, 30),
(3, 2, 40),
(1, 3, 40),
(3, 3, 60),
(1, 4, 60),
(3, 4, 70),
(1, 7, 90),
(1, 8, 90);
序号函数有:row_number()
、 rank()
、 dense_rank()
具体的使用方法看:序号函数的使用
偏移量函数是从窗口的首行或者末行开始偏移 n
行
当前分组内,当前行向下偏移,语法 lead(expr, n, default)
expr
:可以是列名或者表达式n
:当前行下第 n
行的值,可选,默认为 1
default
:当前后没有 n
行的值,可选,默认为 null
比如 id = 1, month = 1, lead_salary = 40
,它的 lead_salary = 40
是 id = 1, month = 3
的 salary
SELECT
id,
MONTH,
salary,
lead( salary, 2 , 0) over ( PARTITION BY id ORDER BY `month` ) lead_salary
FROM
employee;
当前分组内,当前行向上偏移,语法 lag(expr, n, default)
用法和 lead
一样
SELECT
id,
MONTH,
salary,
lag( salary, 2 , 0) over ( PARTITION BY id ORDER BY `month` ) lag_salary
FROM
employee;
分布函数的返回值是 0 ~ 1
之间的数
当前 序号 - 1
占 总行数 - 1
的比例: rank() - 1 / total_row() - 1
分子是 rank()
,所以序号可能会重复
SELECT
*,
rank() over ( PARTITION BY id ORDER BY `month` ) as rk,
percent_rank() over ( PARTITION BY id ORDER BY `month` ) as p_rk
FROM
employee;
当前 序号
占 总行数
的比例: rank() / total_row()
分子也是 rank()
,所以序号可能会重复
SELECT
*,
rank() over ( PARTITION BY id ORDER BY `month` ) AS rk,
cume_dist() over ( PARTITION BY id ORDER BY `month` ) AS c_rk
FROM
employee;
当前分组总第一个值,不受 order by
影响
SELECT
*,
first_value(salary) over ( PARTITION BY id ORDER BY `month`) AS first_val
FROM
employee;
和 fitst_value()
有区别, last_value()
并不是当前分组的最后一个值,会收 order by
影响
SELECT
*,
last_value(salary) over ( PARTITION BY id ORDER BY `month`) AS last_val
FROM
employee;
当前分组内,第 n
的值,小于 n
的值,为 null
SELECT
*,
nth_value(salary, 2) over ( PARTITION BY id ORDER BY `month`) AS nth_val
FROM
employee;
当前分组内,分成 n
组,从小到开始,直到分完,分组内总条数不一定被 n
整除,所以不一定平均分配。
SELECT
*,
ntile(2) over ( PARTITION BY id ORDER BY `month`) AS ntile_val
FROM
employee;
什么是 SQL:Structured Query Language 结构化查询语言
SQL 作用:通过 SQL 语句我们可以方便的操作数据库中的数据库、表、数据。SQL 是数据库管理系统都需要遵循的规范。不同的数据库生产厂商都支持 SQL 语句,但都有特有内容。
MySQL 数据库的约束保证数据的正确性、有效性和完整性,包括:主键约束,唯一约束,非空约束
显示所有数据库:
show databases;
显示数据库:
show database <database_name>;
显示创建的数据库信息:
show create database <database_name>;
创建数据库:
create database <database_name>;
判断数据库是否存在,并创建数据库:
create database if not exists <database_name>;
创建数据库并指定字符集:
create database <database_name> character set <utf8>;
使用数据:
use <database_name>;
查看使用数据库:
select database();
修改数据库字符集:
alter database <database_name> default character set <utf8>;
删除数据库:
drop database <database_name>;
查看所有表:
show tables;
创建表:
create table <table_name> (<name1> <type1>, <name2> <type2>);
查看表结构:
desc <table_name>;
查看建表语句:
show create table <table_name>;
创建一个表结构相同的表:
create table <new_table_name> like <old_table_name>;
删除表:
drop table <table_name>;
判断表存在并删除表:
drop table if exists <table_name>;
添加表列:
alter table <table_name> add <col_name> <type>;
修改表列类型:
alter table <table_name> modify <col_name> <type>;
修改列名:
alter table <table_name> change <old_col_name> <new_col_name> <type>;
删除列:
alter table <table_name> drop <col_name>;
修改表名:
rename table <old_table_name> to <new_table_name>;
修改表字符集:
alter table <table_name> character set <utf8>;
插入全部数据:
insert into <table_name>(name1, name2, ...) values(vaule1, value2, ...);
-- 等价于
insert into values(vaule1, value2, ...);
蠕虫复制:
insert into student2(user_name, age) select user_name, age from student;
insert into student2() select * from student;
更新表记录
update <table_name> set <name>=<value>;
update <table_name> set <name>=<value> where <name>=<value>;
删除表记录
delete from <table_name>;
delete from <table_name> where <name>=<value>;
truncate table <table_name>;
查询不会对数据库中的数据进行修改,只是一种显示数据的方式。
查询值:
select * from student;
别名查询: ps: as 可省略不写。
select <old_col_name> as <new_col_name> from student;
查询 name,age 结果不出现重复的 name:
select distinct name, age from student;
查询结果参与运算:ps: 参与运算的必须是数值类型。
select <col_name> + 固定值 from <table_name>;
select <col1_name> + <col2_name> from <table_name>;
查询 id 为 1、3、5 的数据:
select * from <table_name> where id = 1 or id = 3 or id = 5;
-- 等于
select * from <table_name> where id in (1, 3, 5);
查询 id 不为 1、3、5 的数据:
select * from <table_name> where id not in (1, 3, 5);
查询 id 在 3 到 7 之间的数据:(闭合区间)
select * from <table_name> where id between 3 and 7;
模糊查询:
%: 表示 0 个或者多个字符(任意字符)
_: 表示一个字符
select * from <table_name> where <name> like <'通配符字符串'>;
排序:
select * from <table_name> order by age desc, id asc;
聚合函数
select count(<col_name>) from <table_name>;
分组查询,ps: 一般两个 name 是一样的
select count(*),<name> from <table_name> group by <name>;
分组查询后,在进行筛选
having 和 where 的区别
- having 是在分组后对数据进行过滤,where 是在分组前对数据进行过滤
- having 后面可以使用聚合函数,where 后面不可以使用聚合函数
select count(*),<name> from <table_name> group by <name> having count(*) > 2;
限制: limit offset length
select * from <table_name> limit 3,6;
顺序:
select *|字段名 [as 别名] from 表名 [where 子句] [group by 子句] [having 子句] [order by 子句] [limit 子句];
保证数据的正确性、有效性、完整性。
约束种类:
作用:用来唯一标识一条记录,每个表应该有一个主键,并且每个表只能有一个主键。
通过不用业务字段作为主键,单独给每张表设计一个 id 字段,把 id 作为主键。主键是给数据库和程序员使用的,不是给最终客户使用的。主键有没有含义没有关系,只要不重复,非空就行。
create table <table_name> (
id int primary key,
name varchar(20)
);
alter table <table_name> drop primary key;
create table <table_name> (
id int primary key auto_increment,
name varchar(20)
);
-- 等价于
create table <table_name> (
id int auto_increment,
name varchar(20),
primary key(name) USING BTREE
);
alter table <table_name> auto_increment = 100;
不能插入相同名字,但是可以插入两个 null
索引类别:
unique index
:唯一索引normal index
:普通索引索引方式:
BTREE
HASH
create table <table_name> (
id int,
name varchar(20) unique index
);
-- 等价于
create table <table_name> (
id int,
name varchar(20),
UNIQUE INDEX name(name) USING BTREE
);
create table <table_name> (
id int,
name varchar(20) not null,
gender char(2)
);
create table <table_name> (
id int,
name varchar(20),
location varchar(50) default "射手"
);
查找至少有 5
名直接下属的经理
create table employee (
id int,
name varchar(255),
department varchar(255),
managerId int
);
insert into employee values
(101, 'John', 'A', null),
(102, 'Dan', 'A', 101),
(103, 'James', 'A', 101),
(104, 'Amy', 'A', 101),
(105, 'Anne', 'A', 101),
(106, 'Ron', 'B', 101);
select name from employee where id in (
select managerId from employee group by managerId having count(managerId) >= 5
);
managerId
分组,使用 having
筛选出大于等于 5
名下属的经理 id
select managerId from employee group by managerId having count(managerId) >= 5
in
查出经理名字select name from employee, (
select managerId from employee group by managerId having count(managerId) >= 5
) as temp where employee.id = temp.managerId;
managerId
分组,使用 having
筛选出大于等于 5
名下属的经理 id
,作为临时表select managerId from employee group by managerId having count(managerId) >= 5
employee.id = temp.managerId
查询实现分数排名,如果两个分数相同,这排名相同。
create table scores (
id int,
score double
)
insert into scores values(1,3.50),(2,3.65),(3,4.00),(4,3.85),(5,4.00),(6,3.65)
select score,
case
when @pre=score then @rank:=@rank
when @pre:=score then @rank:=@rank + 1
end as 排名
from scores, (select @pre=null, @rank:=0) as init
order by score desc;
score
进行降序排列case ... when ... then ... end
算出排名select score, dense_rank() over(order by score desc) as 排名 from scores;
使用 dense_rank()
排名函数
获取每位玩家第一次登录平台的日期
create table activity (
player_id int,
device_id int,
event_date date,
games_played int
);
insert into activity values
(1, 2, '2016-03-01', 5),
(1, 2, '2016-05-02', 6),
(1, 3, '2017-06-25', 1),
(3, 1, '2016-03-02', 0),
(3, 4, '2018-07-03', 5);
select player_id, min(event_date) as first_login from activity
group by player_id;
player_id
将 activity
分组min
函数,求出日期的最小值select player_id, event_date as first_login from (
select
player_id,
event_date,
dense_rank()
over(partition by player_id order by event_date) as 排名
from activity
) as temp where 排名 = 1;
dense_rank
函数按照 player_id
和 event_date
进行排序,并算出排名将其作为临时表 temp
temp
,筛选出 排名 = 1
数据编写一个 SQL 查询,选出每个销售产品的 第一年 的 product_id
、 first_year
、 quantity
****和 price
。
sales
表中的主键是 sale_id
外键是 product_id
product
表中的主键是 product_id
ps:如果第一年有两个价格也需要查询出来
CREATE TABLE sales (
sale_id INT,
product_id INT,
year INT,
quantity INT,
price INT
);
INSERT INTO sales ( sale_id, product_id, year, quantity, price ) VALUES
( 1, 100, 2008, 10, 5000 ),
( 2, 100, 2009, 12, 5000 ),
( 7, 200, 2011, 15, 9000 );
CREATE TABLE product (
product_id INT,
product_name VARCHAR ( 10 )
);
INSERT INTO product ( product_id, product_name ) VALUES
( 100, 'Nokia' ),
( 200, 'Apple' ),
( 300, 'Samsung' );
first_year
和产品id product_id
product
表用不到select
product_id,
year as first_year,
quantity,
price
from (
select
product_id,
year,
quantity,
price,
rank() over(partition by product_id order by year) as rn
from sales
) as temp where rn = 1;
rank()
,按照 product_id
分组, year
升序排序,得到 rn
row_number
temp
temp
筛选出 rn = 1
的数据,因为 rn
是升序排序的,所以 rn = 1
就是第一年select
product_id,
year as first_year,
quantity,
price
from sales where (product_id, year) in (
select product_id, min(year) from sales group by product_id
);
group by
按照 product_id
进行分组,查询出 product_id
和第一年 first_year
sales
表,筛选出第一步中查询出来的 product_id
和 first_year
select
sales.product_id,
first_year,
quantity,
price
from (
select product_id, min(year) first_year from sales group by product_id
) as temp
left join sales on temp.product_id = sales.product_id
and temp.first_year = sales.year;
group by
按照 product_id
进行分组,查询出 product_id
和第一年 first_year
temp
和 sales
左连,连接条件是 product_id
和 first_year = year
选出票数最多的那位,默认没有平局,只有以为当选
create table if not exists candidate (id int, name varchar(255))
insert into candidate (id, name) values (1, 'A'),
(2, 'B'),
(3, 'C'),
(4, 'D'),
(5, 'E');
create table if not exists vote (id int, candidateId int);
insert into vote (id, candidateId) values (1, 2),
(2, 4),
(3, 3),
(4, 2),
(5, 5);
select name from candidate where id = (
select candidateId from vote
group by candidateId
order by count(candidateId) desc limit 1
);
在 vote
表中通过 group by
对 candidateId
分组,并且按照票数排序,然后筛选出得票最高的那人的 id
,接着在 candidate
表中筛选出 id
的 name
select name from candidate, (
select
distinct candidateId,
count(candidateId) over(partition by candidateId) as counts
from vote order by counts desc limit 1
) as temp where candidate.id = temp.candidateId;
方案二和方案一的思路是一样的,只是这里分组使用的是 over(partition by candidateId)
。
select name from candidate join vote on candidate.id = vote.candidateId
group by name order by count(name) desc limit 1;
先将 candidate
表和 vote
表通过 candidate.id
和 vote.candidateId
内联。通过 group by
对 name
分组,并且按照票数排序,然后筛选出得票最高的那人。
abs()
绝对值正常情况下,绝对值的结果
select abs(2); // 2
select abs(-2); // 2
select abs(0); // 0
select abs(-0); // 0
有些极端情况下,输入的值不一定是数字,我们看一下它的结果:
select abs('11'); // 11
select abs('123dd'); // 123
select abs('dd123'); // 0
select abs('mysql'); // 0
select abs(null); // null
select abs(1/0); // null
select abs('2022-01-22'); // 2022
select abs(true); // 1
select abs(false); // 0
select abs(CURRENT_TIMESTAMP()); // 20220128155334
总结:
字符串类型的数字
,先将它转换成数字再做绝对值数字开头加非数字字符串
,取前面数字部分做绝对值非数字开头
,结果是 0
null
,结果是 null
1/0
,结果是 null
true
,结果是 1
false
,结果是 0
字符串是通过特殊符号连接
,则去第一个特殊符号前的值,做转换mod()
取余数正常情况下,取余数的结果
select mod(4, 3); // 1
select mod(4, 2); // 0
select mod(3, 4); // 3
在极端情况下的结果
select mod(4, 0); // null
select mod(4, 'e'); // null
select mod(4, '3'); // 1
select mod(4, '123d'); // 1
select mod(4, true); // 0
select mod(4, 1/0); // null
select mod(4, null); //null
总结:它的转换逻辑
0
或者 null
则结果 null
round()
四舍五入正常情况下,四舍五入的结果
select round(3.24, 1); // 3.2
非正常情况下的结果
select round(3.23, null); // null
select round(3.23, 1/0); // null
总结:算术类的函数,如果输入的值不是数字,会先将它转换成数字,然后在进行后面的操作。
concat()
连接函数正常情况下使用 concat()
的结果
select concat('abc', 123); // 'abc123'
在极端情况下使用结果
select concat('abc', true); // 'abc1'
select concat('abc', false); // 'abc0'
select concat('abc', null); // null
select concat('abc', 1/0); // null
select concat('abc', CURRENT_TIMESTAMP()); // abc2022-01-29 14:52:01
这个函数比较简单,没有太多极端的情况,和 null
连接,结果就是 null
length()
计算长度一个汉字算三个字符,一个数字或者字母算一个字符
select length('abc'); // 3
select length(23); // 2
select legnth(1.2); // 3
select length("你好"); // 6
极端情况的结果
select length(true); // 1,true 是 1 所以结果是 1
select length(null); // null
char_length()
字符长度字母,数字,汉字都算一个字符
select char_length('abc'); // 3
select char_length('你好'); // 2
select char_length(123); // 3
select char_length(true); // 1
select char_legnth(null); // null
lower()
、 upper()
大小写函数select lower('FF'); // 'ff'
select lower(1); // 1
select lower(true); // 1
select lower(null); // null
select lower(1/0); // null
select lower('你好'); // '你好'
upper()
和 lower()
用法一样,如果是不是字母,输出原值
replace()
替换函数select replace('abc', 'c', 'd'); // 'abd'
select replace('abc', 'd', 'D'); // 'abc'
select replace('abc', 'c', true); // 'ab1'
select replace('abc', 'c', null); // null
select replace(true, 1, 'abc'); // 'abc'
select replace(true, true, 'abc'); // 'abc'
select replace(true, 'a', 34); // 1
substring(string, start, length)
截取字符串函数start
表示开始截取的位置,length
数字表示截取的长度,
start
从 1
开始,不是 0
start
如果是负数,表示从后往前开始select substring('abcd', 1, 3); // 'abc'
select substring('abcd', 0, 3); // 空
select substring('abcd', 3, 1); // 'c'
select substring('abcd', 3, -1); // 空
select substring('abcd', 1, 1); // 'a'
select substring('abcd', 2, 2); // 'bc'
select substring('abcd', -1, 1); // 'd'
select substring('abcd', 1, null); // null
select substring('abcd', 1, 'd'); // 空
current_time()
获取当前系统的时间
current_date()
获取当前系统的日期
current_timestamp()
获取当前系统的时间 + 日期
extract()
获取具体的年月日
date()
获取时间的日期部分
year()
获取时间的年份部分
month()
获取时间的月份部分
day()
获取时间的天数部分
hour()
获取时间的小时部分
minute()
获取时间的分钟部分
second()
获取时间的秒部分
select current_time(); // 12:01:34
select current_date(); // 2022-01-30
select current_timestamp(); // 2022-01-30 12:01:34
select extract(year from '2022-01-30 12:01:34'); // 2022
select date('2022-01-30 12:01:34'); // 2021-01-30
select year('2022-01-30 12:01:34'); // 2022
select month('2022-01-30 12:01:34'); // 01
select day('2022-01-30 12:01:34'); // 30
select hour('2022-01-30 12:01:34'); // 12
select minute('2022-01-30 12:01:34'); // 01
select second('2022-01-30 12:01:34'); // 34
cast()
数据类型转换参数是个表达式,通过 as
分割 2 个参数,一个是原始数据,一个是目标类型
select cast(12.3 as signed); // 12
select cast(12.3 as char); // 12.1
将字符串数字转成 int
类型会报错,转成 float
类型就不会报错
select cast('12' as int); // 报错
select cast('12' as float); // 12
还可以使用 decimal()
指定精度,接收两个参数,第一个参数是精度位(包括小数部分),第二个参数是小数位数
下面的 SQL 语句中,为什么两个输出值是一样的?
因为小数位是两个,小数位加整数位不够八位,所以,最后呈现出来的是 6
。
select cast('1234.56789' as decimal(8, 2)); // 1234.57
select cast('1234.56789' as decimal(6, 2)); // 1234.57
coalesce()
返回第一个非空数值select coalesce(null, null, 2); // 2
select coalesce(null, 1); // 1
找出每个部门获得前三高工资的所有员工
create table employee (
id int primary key auto_increment,
name varchar(255),
salary int,
departmentId int
);
create table department (
id int primary key auto_increment,
name varchar(255)
);
insert into employee (name, salary, departmentId) values
('joe', 85000, 1),
('henry', 80000, 2),
('sam', 60000, 2),
('max', 90000, 1),
('janet', 69000, 1),
('randy', 85000, 1),
('will', 70000, 1);
insert into department(name) values('it'),('sales');
select department.id, employee.name, employee.salary from (
select te.departmentId, te.salary,
case ①
when @pre=departmentId then @rank:=@rank + 1
when @pre:=departmentId then @rank:=1
end as 排名
from (select @pre:=null, @rank:=0) tt,
(
select departmentId, salary from employee
group by departmentId, salary
order by departmentId, salary desc
) te
) t
inner join department on t.departmentId = department.id
inner join employee on t.departmentId = employee.departmentId
and employee.salary = t.salary and 排名 <= 3
order by t.departmentId, t.salary desc;
employee
按照 departmentId
和 salary
进行分组,将这个临时表命名为 te
case ... when ... then ... end
和变量根据薪水算出排名,将这个临时表命名为 t
。inner join
分别连接 department
和 employee
t
表和 department
表连接条件是 t.departmentId = department.id
t
表和 employee
表连接条件是 t.departmentId = employee.departmentId and t.salary = employee.salary
②① case
语句中 when
应该是条件,这里为什么用赋值 :=
case
时, when @prev = departmentId then ...
执行的时,此时 @prev
为 null
不满足条件,所以它就会执行 when @prev := departmentId then ...
,此时 @prev
为第一条数据的 departmentId
由于赋值语句肯定为 true
,所以 @rank
就为 1
case
时, when @prev = departmentId then ...
由于 @prev
有值了,下面的 when
就不会执行了。case
时, when @prev = departmentId then ...
,第三条数据的 departmentId
是一个新的值,此时不满足 @prev = departmentId
,就会进入第二个 when @prev := departmentId then ...
将 departmentId
的最新值赋值给 @prev
② 这里为什么用 t.salary = employee.salary
而不用 t.name = employee.name
t.salary = employee.salary
作用是确定是同一个用,这里就有个问题,确定同一个人的话,为什么不用 name
做条件呢? 这里是因为 te
按照 salary
和 department
进行分组,不考虑 name
的原因是可能会有两个人的 salary
是一样的,如果在加上 name
的话,就会出现两个 salary
相同的人,排名不一样。
现在给的数据有两个人的 salary
是一样的,可以将其中一个 salary
修改一下,就可以知道结果了。
create table sequence (id int, name varchar(10), age int);
insert into sequence values
(1, '张三', 19),
(2, '李四', 20),
(3, '王五', 30),
(4, '赵六', 24),
(5, '小红', 19);
使用 select * from sequence
查询出来结果,是默认排序的,如下图:
id | name | age |
---|---|---|
1 | 张三 | 19 |
2 | 李四 | 20 |
3 | 王五 | 30 |
4 | 赵六 | 24 |
5 | 小红 | 19 |
下面使用分别使用 row_number()
, rank()
, dense_rank()
排序,看看是什么效果。
row_number()
根据顺序排序,不会重复将 age
进行升序排列,需要结合 over()
来使用
select *, row_number() over(order by age) as 排名 from sequence;
如图所示:
id | name | age | 排名 |
---|---|---|---|
1 | 张三 | 19 | 1 |
5 | 小红 | 19 | 2 |
2 | 李四 | 20 | 3 |
4 | 赵六 | 24 | 4 |
3 | 王五 | 30 | 5 |
查询出来的结果是按照 age
进行升序排列,不重复, age
相同的数据按照它的默认顺序进行展示,如果想自定义相同 age
的顺序,和 order by
字段使用的方法一致
rank()
排序相同时,会重复将 age
进行升序排列,需要结合 over()
来使用
select *, rank() over(order by age) as 排名 from sequence;
如图所示:
id | name | age | 排名 |
---|---|---|---|
1 | 张三 | 19 | 1 |
5 | 小红 | 19 | 1 |
2 | 李四 | 20 | 3 |
4 | 赵六 | 24 | 4 |
3 | 王五 | 30 | 5 |
相同的 age
排名一样,也就是说这里没有第二名。
不过需要注意的是,不能自定义相同 age
的顺序,否则它的排名会变化
age
相同,按照 id
降序排列
select *, rank() over(order by age, id desc) as 排名 from sequence;
如图所示:
id | name | age | 排名 |
---|---|---|---|
5 | 小红 | 19 | 1 |
1 | 张三 | 19 | 2 |
2 | 李四 | 20 | 3 |
4 | 赵六 | 24 | 4 |
3 | 王五 | 30 | 5 |
会根据 age
和 id
两个维度进行排序,也就不存在两个并列第一名了。
dense_rank()
排序相同时,会重复,但不会跳过重复的排名将 age
进行升序排列,需要结合 over()
来使用
select *, dense_rank() over(order by age) as 排名 from sequence;
如图所示:
id | name | age | 排名 |
---|---|---|---|
1 | 张三 | 19 | 1 |
5 | 小红 | 19 | 1 |
2 | 李四 | 20 | 2 |
4 | 赵六 | 24 | 3 |
3 | 王五 | 30 | 4 |
age
相同的进行排序,并列第一名,但是后面的数据并不会跳过第二名,而是紧接着第一名排序
它和 rank
一样,不能自定义相同 age
的顺序,否则它的排名也会变化,因为排序会按照两个维度来。
将 2016 年 (TIV_2016) 所有成功投资的金额加起来,保留 2 位小数。
对于一个投保人,他在 2016 年成功投资的条件是:
create table insurance (
pid int,
tiv_2015 float,
tiv_2016 float,
lat float,
lon float
);
insert into insurance (pid, tiv_2015, tiv_2016, lat, lon) values
('1', '10', '5', '10', '10'),
('2', '20', '20', '20', '20'),
('3', '10', '30', '20', '20'),
('4', '10', '40', '40', '40');
这题的难点有两个:
解决了这两个方法,这题就解决了。 方法一使用的是窗口函数,方法二使用的是子查询
select round(sum(tiv_2016), 2) tiv_2016 from (
select
tiv_2016,
count(*) over(partition by tiv_2015) count_tiv_2015,
count(*) over(partition by lat, lon) count_lat_lon
from insurance
) as temp where count_lat_lon = 1 and count_tiv_2015 > 1
使用窗口函数进行分组
筛选出一个人在 2015 年投保额至少和任何一个人在 2015 年的投保额相同,就是按照 2015 年的投保额进行分组,并计算个数
count(*) over(partition by tiv_2015) count_tiv_2015
筛选不同的地点,就是按照经纬度进行分组,计算每个每个经纬度的个数
count(*) over(partition by lat, lon) count_lat_lon
将 1
、 2
两步和 tiv_2016
字段作为临时表 temp
查询 temp
筛选出问题中的两步
count_tiv_2015 > 1
投保额 tiv_2015
至少有两个人是相等的count_lat_lon = 1
经纬度是唯一的通过筛选条件后使用 sum()
计算出 tiv_2016
并且使用 round()
保留两个小数
select round(sum(tiv_2016), 2) tiv_2016 from insurance where tiv_2015 in (
select tiv_2015 from insurance group by tiv_2015 having count(*) > 1
) and concat(lat, lon) in (
select
concat(lat, lon)
from insurance group by lat, lon having count(*) = 1
);
使用子查询
筛选出一个人在 2015 年投保额至少和任何一个人在 2015 年的投保额相同,使用 group by
按照 tiv_2015
分组,并且使用 count()
计算,筛选出大于 1
的数据,因为大于 1
代表至少有两个人在 2015 年投保额相同
select tiv_2015 from insurance group by tiv_2015 having count(*) > 1;
筛选不同的地点,就是按照 lat
和 lon
进行分组,这里查询的字段是使用 concat()
连接 lat, lon
后的值,并且使用 count()
计算,筛选出等于 1
的数据,因为等于 1
代表地点唯一
ps:使用 in
操作会消耗性能
select
concat(lat, lon)
from insurance group by lat, lon having count(*) = 1;
查询 insurance
,筛选条件是两个子查询, tiv_2015 in 1
和 concat(lat, lon) in 2
通过筛选条件后使用 sum()
计算出 tiv_2016
并且使用 round()
保留两个小数
查出超过或等于 5
名学生的课(学生的课不被重复计算)
create table courses (
student varchar(255),
class varchar(255)
)
insert into courses values
('A', 'Math'),
('B', 'English'),
('C', 'Math'),
('D', 'Biology'),
('E', 'Math'),
('F', 'Computer'),
('G', 'Math'),
('H', 'Math'),
('I', 'Math');
select class from courses group by class having count(*) >= 5;
学生的课不重复,所以按照 class
分组,在使用 having
筛选出大于等于 5
课的同学
select class from (
select class, count(*) as num from courses group by class
) as c where num >= 5;
group by
分组5
人。题目链接:行程和用户
查询出 2013-10-01
和 2013-10-03
期间非禁止用户(乘客和司机都必须未被禁止)的取消率,非禁止用户即 banned
为 No
的用户,禁止用户即 banned
为 Yes
的用户。
取消率需要四舍五入保留两位小数
其中 client_id
和 driver_id
对应 users
表中的 users_id
create table trips (
id int,
client_id int,
driver_id int,
city_id int,
status char(255),
request_at date
);
insert into trips values(1, 1, 10, 1, 'completed', '2013-10-01');
insert into trips values(2, 2, 11, 1, 'cancelled_by_driver', '2013-10-01');
insert into trips values(3, 3, 12, 6, 'completed', '2013-10-01');
insert into trips values(4, 4, 13, 6, 'cancelled_by_client', '2013-10-01');
insert into trips values(5, 1, 10, 1, 'completed', '2013-10-02');
insert into trips values(6, 2, 11, 6, 'completed', '2013-10-02');
insert into trips values(7, 3, 12, 6, 'completed', '2013-10-02');
insert into trips values(8, 2, 12, 12, 'completed', '2013-10-03');
insert into trips values(9, 3, 10, 12, 'completed', '2013-10-03');
insert into trips values(10, 4, 13, 12, 'cancelled_by_driver', '2013-10-03');
create table users (
users_id int,
banned char(255),
role char(255)
);
insert into users value(1, 'No', 'client');
insert into users value(2, 'Yes', 'client');
insert into users value(3, 'No', 'client');
insert into users value(4, 'No', 'client');
insert into users value(10, 'No', 'driver');
insert into users value(11, 'No', 'driver');
insert into users value(12, 'No', 'driver');
insert into users value(13, 'No', 'driver');
select
trips.request_at as 日期,
round(sum(if(trips.status = 'completed', 0, 1)) / count(trips.status), 2) as 取消率
from trips
join users u1 on trips.client_id = u1.users_id and u1.banned = 'No'
join users u2 on trips.driver_id = u2.users_id and u2.banned = 'No'
where trips.request_at between '2013-10-01' and '2013-10-03'
group by 日期;
错误的思路:
if (client_id = users_id or driver_id = users_id) and users_id没有被禁止
...
只要 client_id
和 driver_id
只要有一个满足条件,就会被查出来
SQL
代码
select * from trips join users on (trips.client_id = users.users_id
or trips.driver_id = users.users_id) and users.banned = 'No';
这样查出的结果没有排除掉 users_id = 2
的用户
正确的思路:
if(client_id = users_id and users_id没有禁止
and driver_id = users_id and users_id没有被禁止)
...
client_id
和 driver_id
要分别和自己关联的 users_id
判断是否被禁止。
SQL
代码
select * from trips
join users u1 on trips.client_id = u1.users_id and u1.banned = 'No'
join users u2 on trips.driver_id = u2.users_id and u2.banned = 'No';
if
语法: if(expr1, expr2, expr3)
如何 expr1
为 true
则输出为 expr2
否则输出为 expr3
select
trips.request_at as 日期,
round(sum(if(trips.status = 'completed', 0, 1)) / count(trips.status), 2) as 取消率
from trips
left join (
select users_id from users where banned = 'Yes'
) as a1 on trips.client_id = a1.users_id
left join(
select users_id from users where banned = 'Yes'
) as a2 on trips.driver_id = a2.users_id
where a1.users_id is null and a2.users_id is null
and trips.request_at between '2013-10-01' and '2013-10-03'
group by 日期;
找出被禁止的用户
select users_id from users where banned = 'Yes';
错误的思路:
select * from trips, (
select users_id from users where banned = 'Yes'
) as a
where trips.client_id != a.users_id and trips.driver_id != a.users_id
有两点错误:
a
表为空时,最后的结果是为空的!=
是查不出结果的修改成这样就可以了
select * from trips where
trips.client_id in (select users_id from users where banned = 'No')
and
trips.driver_id in (select users_id from users where banned = 'No');
如果不使用 in
需要换一种思路
正确的思路:
select
trips.request_at as 日期,
round(sum(if(trips.status = 'completed', 0, 1)) / count(trips.status), 2) as 取消率
from trips
left join (
select users_id from users where banned = 'Yes'
) as a1 on trips.client_id = a1.users_id
left join (
select users_id from users where banned = 'Yes'
) as a2 on trips.driver_id = a2.users_id
where a1.users_id is null and a2.users_id is null
and trips.request_at between '2013-10-01' and '2013-10-03'
group by 日期;
查出被禁止的用户作为表 a
和 trips
表做两次左连:
trips.client_id = a1.users_id
trips.driver_id = a2.users_id
因为 trips
作为主体表,和 a
表左连,连接的结果是被禁止的用户行程数据
而我们要的结果是非禁止用户的形成数据, a
中 users_id
为 null
的就是我们要的数据
所以最后的筛选条件是 a1.users_id is null and a2.users_id is null
这种写法还可以反过来写,查出没被禁止的用户 a
和 trips
做内连,就不会有 null
存在
select
trips.request_at as 日期,
round(sum(if(trips.status = 'completed', 0, 1)) / count(trips.status), 2) as 取消率
from trips
join (
select users_id from users where banned = 'No'
) as a1 on trips.client_id = a1.users_id
join (
select users_id from users where banned = 'No'
) as a2 on trips.driver_id = a2.users_id
where trips.request_at between '2013-10-01' and '2013-10-03'
group by 日期;
写一条SQL 查询语句获取 sales
表中所有产品对应的 产品名称 product_name
以及该产品的所有 售卖年份 year
和 价格 price
。
sale_id
, year
) 是销售表 sales
的主键sales
表中的 product_id
是关联到产品 product
的外键product
表中的 product_id
是主键CREATE TABLE sales (
sale_id INT,
product_id INT,
year INT,
quantity INT,
price INT
);
INSERT INTO sales ( sale_id, product_id, year, quantity, price ) VALUES
( '1', '100', '2008', '10', '5000' ),
( '2', '100', '2009', '12', '5000' ),
( '7', '200', '2011', '15', '9000' );
CREATE TABLE product (
product_id INT,
product_name VARCHAR ( 10 )
);
INSERT INTO product ( product_id, product_name ) VALUES
( '100', 'Nokia' ),
( '200', 'Apple' ),
( '300', 'Samsung' );
sales
表中的外键 product_id
和 product
表中的主键 product_id
做关联所以方法一和方法二都是使用这种方法的不同形式。
select product_name, year, price from sales join product using(product_id)
使用 join
方法连接两张表,连接条件是 product_id
。
select
product_name,
year,
price
from sales, product where sales.product_id = product.product_id
两表查询,通过 where
连接两张表之间的 product_id
。
查询所有数字的中位数,结果四舍五入为一位小数。
解析:
frequency
是 num
出现的次数numbers
表可得到 [0, 0, 0, 0, 0, 0, 0, 1, 2, 2, 2, 3]
create table numbers(
num int,
frequency int
)
insert into numbers values
(0, 7),
(1, 1),
(2, 3),
(3, 1);
select avg(num) as median from (
select
num,
sum(frequency) over(order by num desc) desc_frequency,
sum(frequency) over(order by num asc) asc_frequency,
sum(frequency) over() total_frequency
from numbers
) as temp
where desc_frequency >= total_frequency / 2
and asc_frequency >= total_frequency / 2;
中位数就是将所有数字按照升序或者降序排列,然后取最中间的数字
步骤:
sum(frequency) over() total_frequency
计算出所有数字的个数,这里使用窗口函数 over()
就不需要再后面使用 group by
select sum(frequency) as total_frequency from numbers
sum(frequency) over(order by num desc) desc_frequency
使用窗口函数 over(order by num desc)
按照 num
降序计算出当前数字和之前数字出现的次数
select num, sum(frequency) over(order by num desc) desc_frequency
from numbers;
num | desc_frequency |
---|---|
3 | 1 |
2 | 4 |
1 | 5 |
0 | 12 |
sum(frequency) over(order by num asc) asc_frequency
使用窗口函数 over(order by num asc)
按照 num
升序计算出当前数字和之前数字出现的次数
select num, sum(frequency) over(order by num asc) asc_frequency
from numbers;
num | asc_frequency |
---|---|
0 | 7 |
1 | 8 |
2 | 11 |
3 | 12 |
num
, desc_frequency
, asc_frequency
, total_frequency
作为临时表 temp
temp
, 筛选条件是 desc_frequency >= total_frequency / 2 and asc_frequency >= total_frequency / 2
, desc_frequency
的一半就是中位数num
就是中位数,使用 avg
对其求平均数,因为如果是偶数个的话,查出来的中位数是两个。select avg(num) as median from (
select
num,
(
select sum(frequency) from numbers n2 where n1.num >= n2.num
) asc_frequency,
(
select sum(frequency) from numbers n3 where n1.num <= n3.num
) desc_frequency,
(select sum(frequency) from numbers) total_frequency
from numbers n1
) temp
where asc_frequency >= total_frequency / 2
and desc_frequency >= total_frequency / 2;
和方法一的思路是一样的,没有使用窗口函数 over()
,而是使用查询的方式并通过 sum()
计算出 asc_frequency
、 desc_frequency
、 total_frequency
select avg(num) as median from (
select n1.num from numbers n1 join numbers n2 group by n1.num
having sum(if(n1.num >= n2.num, n2.frequency, 0)) >= sum(n2.frequency) / 2
and sum(if(n1.num <= n2.num, n2.frequency, 0)) >= sum(n2.frequency) / 2
) as temp;
numbers
表自连接,并按照 n1.num
分组if(n1.num >= n2.num, n2.frequency, 0)
,第二个参数为什么使用 n2.frequency
是因为按照 n1.num
进行分组的,再使用 sum()
计算 n1.num
比 n2.num
大的 n2.frequency
n1.num = 0
只会大于等与 n2.num = 0
,只会计算 0
的 n2.frequency
n1.num = 1
符合条件的有 n2.num = 0, n2.num = 1
,就会计算 0
和 1
的 n2.frequency
if(n1.num >= n2.num, n2.frequency, 0)
,也是同样的流程这三种方法都是利用某一数的正序或者逆序的累计大于或等于所有数的个数的一半
查询每个专业的学生人数(没有学生的专业也需要列出),将查询结果按照学生人数降序排列,如果两个或者两个以上的专业有相同的学生人数,按照专业名的字典从小到大排列
create table student(
student_id int,
student_name varchar(255),
gender varchar(255),
dept_id int
);
insert into student values
(1, 'Jack', 'M', 1),
(2, 'Jane', 'F', 1),
(3, 'Mark', 'M', 2);
create table department(
dept_id int,
dept_name varchar(255)
);
insert into department values
(1, 'Engineering'),
(2, 'Science'),
(3, 'Law');
SELECT
dept_name,
count( student_id ) AS student_number
FROM
department
LEFT JOIN student ON department.dept_id = student.dept_id
GROUP BY
dept_name
ORDER BY
dept_name DESC
这题有两个要点:
count
处理 null
时的问题第一点,因为现在要计算每个专业的人数,如果是 student
左连 department
就会忽略掉没有人的专业,造成最后专业缺失,所以要用 department
左连 student
第二点, count(*)
计算的是总行数,不会忽略 null
,但是这里有些专业没有人,计算出来应该是 0
,所以应该使用 count(student_id)
找出所有至少连续出现三次的数字
create table logs(id int, num int)
insert into logs values
(1, 1),
(2, 1),
(3, 1),
(4, 2),
(5, 1),
(6, 2),
(7, 2);
select distinct l.num from (
select num,
if(@n = num, @c:=@c + 1, @c:=1) as c,
@n:=num from logs, (select @n:=0, @c:=0) as init
) as l where l.c >= 3;
利用计数来实现,如果这个数和上个数相同,就 +1
,否则重新赋值为 1
。
select distinct a.num from logs a
left join logs b on a.id = b.id + 1
left join logs c on a.id = c.id + 2
where a.num = b.num and a.num = c.num;
将 logs
表自身连接三次,因为要连续重复出现 3 个数字,所以连接条件是 a.id = b.id + 1
和 a.id = c.id + 2
然后在筛选出数字一样的值,对最后结果进行 distinct
就行了。
将所有数据 sex
的 f
和 m
对换,仅使用单个 update
语句,且不产生中间临时表
create table salary (
id int primary key,
name varchar(255),
sex varchar(255),
salary int
)
insert into salary values
(1, 'A', 'm', 2500),
(2, 'B', 'f', 1500),
(3, 'C', 'm', 5500),
(4, 'D', 'f', 500);
update salary set sex = case sex when 'm' then 'f' else 'm' end;
如果 sex = 'm'
, 那么就设置为 f
,否则设置为 m
。
使用 case sex when 'm' then 'f' else 'm' end
就可以实现
update salary set sex = if(sex = 'm', 'f', 'm');
思路和方法一一样
这是使用 if(sex = 'm', 'f', 'm')
update salary set sex = char(ascii(sex) ^ ascii('m') ^ ascii('f'));
每一个字母都对应一个 ascii
,使用 ascii()
就可以把字母转换成 ascii
。
在利用异或,相同的数为 0
, 0
和任何数进行异或都是那个数。
再利用 char
将 ascii
码转换成字母
具体的步骤:
ascii('m')
为 109
, ascii('f')
为 102
ascii('m') ^ ascii('m') ^ ascii('f')
,所以转化成 109 ^ 109 ^ 102
, 109
和 109
异或为 0
, 0
和 102
异或为 102
,在用 char(102)
得出结果为 m
。ascii('f') ^ ascii('m') ^ ascii('f')
,所以转化成 102 ^ 109 ^ 102
,可以交换下 109
和 102
的位置,所以 102
和 102
异或为 0
, 0
和 109
异或为 109
,在用 char(109)
得出结果为 f
。date_add
对指定的日期加上间隔时间,得到一个新的日期语法: date_add(date, interval num type)
,第一个参数是 date
或者 date-time
都行,第二个参数是有三部分组成的: interval
关键字,间隔时间,间隔单位。
type
取值:
microsecond | 毫秒 |
---|---|
second | 秒 |
minute | 分钟 |
hour | 小时 |
day | 天 |
week | 周 |
month | 月 |
quarter | 季 |
year | 年 |
select date_add('2022-01-03', interval 1 day); // 2022-01-04
如果日期是个无效的日期,返回值是 null
select date_add('2022-01-33', interval 1 day); // null
如果第一个参数只有日期,没有时间,也可以写间隔时间的
select date_add('2022-01-03', inteval 1 minute); // 2022-01-03 00:01:00
间隔时间也可以是负数
select date_add('2022-01-03', interval -1 day); // 2022-01-02
可以接受复合时间,不过时间要用引号,能用的复合时间只有这五种
year_month
day_hour
hour_minute
minute_second
second_microsecond
其他的复合时间会报错,有两个例外
day_minute
结果和 hour_minute
一样。minute_microsecond
结果和 second_microsecond
一样select date_add('2022-01-03', interval '1-1' day_hour); // 2022-01-04 01:00:00
-----
select date_add('2022-01-03', interval '1-1' day_minute); // 2022-01-03 01:01:00
select date_add('2022-01-03', interval '1-1' hour_minute); // 2022-01-03 01:01:00
复合时间的 -
号要写在最前面
select date_add('2022-01-03', '-1-1', day_hour); // 2022-01-01 23:00:00
adddate
和 date_add
一样语法: adddate(date, interval num type)
或者 adddate(date, num)
第二种语法是一种简写,只是在天的基础上增加间隔时间。
select adddate('2022-01-03', 1); // 2022-01-04
datediff()
计算两个日期之间的间隔差语法 datediff(date1, date2)
, date1
是起始时间, date2
是截止时间
select datediff('2021-12-31', '2022-01-02'); // -2
如果时间不存在,返回 null
select datediff('2021-12-32', '2022-01-02'); // null
只有时间的日期部分是参与计算的。
select datediff('2021-12-31 23:59:59', '2022-01-02 00:00:00'); // -2
timediff()
计算两个时间之间间隔差语法 timediff(date-time1, date-time2)
, date-time1
是开始时间, date-time2
是结束时间。
select timediff('2022-01-01 00:00:00', '2022-01-02 00:00:00'); // -24:00:00
计算最大的时间差 838:59:59
, 35
天不到, 35
天是 840
小时。
select timediff('2021-11-10 00:00:00', '2022-01-02 00:00:00'); // -838:59:59
如果时间或者日期不存在,则返回 null
select timediff('2022-01-01 40:00:00', '2022-01-02 00:00:00'); // null
select timediff('2022-01-41 00:00:00', '2022-01-02 00:00:00'); // null
如果不传时间,有个有意思的现象:
00:00:00
,date-time1 > date-time2
返回 00:00:01
date-time1 < date-time2
返回 -00:00:01
select timediff('2022-01-31', '2022-01-02'); // 00:00:00
select timediff('2022-01-02', '2021-12-31'); // 00:00:01
select timediff('2021-12-31', '2022-01-02'); // -00:00:01
查询回答率最高的问题,如果回答率相同,返回 question_id
最小的那个。
解释:
285
显示 1
次,回答 1
次,回答率为 1.0
369
显示 1
次,回答 0
次,回答率为 0.0
create table surveyLog (
id int,
action varchar(255),
question_id int,
answer_id int,
q_num int,
timestamp int
);
insert into surveyLog values
(5, 'show', 285, null, 1, 123),
(5, 'answer', 285, 124124, 1, 124),
(5, 'show', 369, null, 2, 125),
(5, 'skip', 369, null, 2, 126);
select question_id as survey_log from (
select
question_id,
sum(case action when 'answer' then 1 else 0 end) /
sum(case action when 'show' then 1 else 0 end) as rate
from surveyLog group by question_id order by rate desc
) as temp limit 1;
sum
和 case
计算出回答率 rate
,并且升序排列,作为临时表 temp
temp
取第一条数据select action_answer.question_id as survey_log from (
select question_id, count(*) as answer_count from surveyLog
where action = 'answer' group by question_id
) as action_answer join (
select question_id, count(*) as show_count from surveyLog
where action = 'show' group by question_id
) as action_show using(question_id)
order by answer_count / show_count desc limit 1;
action = answer
的数据存为一张临时表 action_answer
action = show
的数据作为一张临时表 show_answer
question_id
连接两表order by
对回答进行排列,取第一条数据select question_id as survey_log from surveyLog
group by question_id
order by avg(action = 'answer') desc limit 1;
一道题只有回答了 action
才是 answer
,这里通过计算每道题的 action = 'answer'
的平均数,因为这里计算平均数的分数是 question_id
的个数,所以 action = 'answer'
个数越多,回答率越高,最后取第一条数据。ps:这里默认 question_id
是升序排列的
select question_id as survey_log from surveyLog
group by question_id
order by count(answer_id) desc limit 1;
一道题只有回答了才有 answer_id
,所以计算出每道题有多少个 answer_id
, answer_id
数量最多的就是回答率最高的,ps:这里默认 question_id
是升序排列的
查找出每一个玩家首次登录的设备名称
create table activity (
player_id int,
device_id int,
event_date date,
games_played int
);
insert into activity values
(1, 2, '2016-03-01', 5),
(1, 2, '2016-05-02', 6),
(1, 3, '2017-06-25', 1),
(3, 1, '2016-03-02', 0),
(3, 4, '2018-07-03', 5);
select activity.player_id, activity.device_id from activity, (
select player_id, min(event_date) as first_login
from activity group by player_id
) as temp
where activity.player_id = temp.player_id
and activity.event_date = temp.first_login;
player_id
最小的 event_date
命名为 temp
临时表activity
和 temp
连查,筛选出 player_id
相等,并且 activity.event_date = temp.first_login
相等的数据select player_id, device_id from (
select
player_id,
device_id,
dense_rank() over(partition by player_id order by event_date) as 排名
from activity
) as temp where temp.排名 = 1;
这个方法和 游戏玩法分析 I
中的方法二一样。
select player_id, device_id from activity a1 where event_date <= all(
select a2.event_date from activity a2 where a1.player_id = a2.player_id
);
使用 all
配合 <=
筛选出 a1.player_id = a2.player_id
的 event_date
数据
编写一个 SQL 查询,获取 employee
表中第 N 高的薪水(salary)。
create table employee (
id int primary key auto_increment,
salary int
);
insert into employee (salary) values(100),(200),(300);
create function getNthHighestSalary(N int) returns int
begin
return(
select e.salary from (
select salary,
case when id, salary is not null then @rank:=@rank + 1 end as 排名
from (select @rank:=0) as init, (
select salary from employee
group by id, salary
order by salary desc
) as t
) as e where e.排名 = N
)
end
外面加了一个函数,把 where
条件变成变量。
create function getNthHighestSalary(N int) returns int
declare m int;
set m = N - 1;
begin
return(
select salary from employee group by salary order by salary desc limit m, 1
);
end
limit
这里没有用变量时,是从 0
开始的,这里的 N
是从 1
开始的,所里这里要减 1
。declare
声明变量 m
set
对变量 m
进行赋值create function getNthHighestSalary(N int) returns int
begin
return (
select salary from (
select id, salary,
@rank:= if(salary is not null, @rank + 1, @rank) as 排名
from employee, (select @rank:=0) as init
group by id, salary
order by salary desc
) as e where 排名 = N
);
end
和方法一一样,把 case
换成了 if
查找出每行人数大于 100
且 id
连续的三行或者更多好记录
create table stadium (
id int,
visit_date date,
people int
)
insert into stadium values
(1, '2017-01-01', 10),
(2, '2017-01-02', 109),
(3, '2017-01-03', 150),
(4, '2017-01-04', 99),
(5, '2017-01-05', 145),
(6, '2017-01-06', 1455),
(7, '2017-01-07', 199),
(8, '2017-01-09', 188);
select * from (
select *, count(*) over(partition by diff) as counts from (
select *, id - row_number() over(order by id) as diff from stadium
where people >= 100
) as base
) as s where counts >= 3;
问题的关键在于如何知道哪几天的 id
是连续的。
这里用的方法是先筛选出 peope
大于等于 100
的数据,然后对这些数据进行排名,在用 id
减去排名,如果他们之间的差一样,就说明那几天是连续的。
具体步骤:
people
大于等于 100
的数据row_number()
对 id
计算出排名id
减去排名,计算出 id
和排名之间的差(作为临时表 base
)base
进行查询并按照 diff
进行分组,命名为 counts
(作为临时表 s
)
over(partition by diff)
比 group by
更加准确。因为 group by
需要和 select
字段一一对应。s
表进行查询,筛选出 counts
大于等于 3
的数据select * from (
select *, (id - (@rrk:=@rrk + 1)) as diff
from stadium, (select @rrk:=0) as init where people >= 100
) as s1 where diff in (
select diff from (
select (id - (@rk:=@rk + 1)) as diff
from stadium, (select @rk:=0) as init where people >= 100
) as s2 group by diff having count(*) >= 3
);
和方法一的思路是一样的,实现的方式不一样,这里是通过 group by
进行分组,所以相对于使用 partition by
的方式步骤更加复杂一点。
people
大于等于 100
的数据。diff
(也可以使用 row_number()
),作为临时表 s1
。s1
表,筛选出我们想要的数据diff
的分组就好了,我们可以通过 in
来筛选。diff
的步骤,然后作为临时表 s2
。diff
,就是 select diff from s2
,使用 group by
对 diff
进行分组,并用 having
筛选出大于等于 3
的 diff
group by
需要和 select
字段一一对应,如果不做这一次查询,使用 group by
将会有问题。s1
表,使用 in
操作符,就能把数据查出来了。在一条 SQL 中不能用两个一样的变量
查找与之前(昨天)日期相比温度更高的所有日期的 id
。
create table weather (
id int primary key auto_increment,
recordDate date,
temperature int
);
insert into weather(recordDate, temperature) values
('2015-01-01', 10),
('2015-01-02', 25),
('2015-01-03', 20),
('2015-01-04', 30);
select weather.id from weather join weather w1
on datediff(weather.recordDate, w1.recordDate) = 1
and weather.temperature > w1.temperature;
只有一张表,现在要找出今天温度比昨天温度高的日期 id
。
所以需要用自连接,也就是把 weather
和 weather
进行自身连接。
在自连之后,需要将自连后的表取个别名 w1
,如果不取别名的话,两个 weather
表名会冲突。这里把 weather
作为今天表, w1
作为昨天表。
两表自连之后需要有连接条件,连接条件是 今天和昨天的日期。
MySQL 提供了datediff
函数,用来比较两个日期之间的时间差,如果两个时间之间相差 1
天,那么就是今天和做题。
最后在筛选出今天的温度高于昨天温度的数据。
select weather.id from weather join weather w1
on weather.recordDate = adddate(w1.recordDate, interval 1 day)
and weather.temperature > w1.temperature;
思路和方法一的思路是一样的,区别在于计算今天和昨天的方法不一样。
这里使用 MySQL 提供的 adddate
函数。这个函数是将日期函数一个规律进行偏移。
select id from (
select
temperature,
recordDate ,
lead(id, 1) over(order by recordDate) as id,
lead(recordDate, 1) over(order by recordDate) as 'nextDate',
lead(temperature, 1) over(order by recordDate) as 'nextTemp'
from weather
) temp
where nextTemp > temperature and datediff(nextDate, recordDate) = 1;
使用窗口函数 lead
,它是从后往前偏移,偏移量为 1
天。
select
temperature,
recordDate ,
lead(id, 1) over(order by recordDate) as nextId,
lead(recordDate, 1) over(order by recordDate) as 'nextDate',
lead(temperature, 1) over(order by recordDate) as 'nextTemp'
from weather;
id | recordDate | temperature | nextId | nextDate | nextTemp |
---|---|---|---|---|---|
1 | 2015-01-01 | 10 | 2 | 2015-01-02 | 25 |
2 | 2015-01-02 | 25 | 3 | 2015-01-03 | 20 |
3 | 2015-01-03 | 20 | 4 | 2015-01-04 | 30 |
4 | 2015-01-04 | 30 | null | null | null |
这里说一下,窗口函数还有一个 lag
是从前往后偏移的,用法和 lead
是一样的。这里就用 lead
来举例。
前三列是 weather
原数据,后三列是使用窗口函数 lead
算出来的数据。
为什么是偏移 1
呢?
因为比较的是今天和昨天,而且这里日期是连续的,所以用 1
。
然后查询出来的数据作为一个临时表 temp
。
筛选条件就是 nextTemp > temperature
,最后使用 datediff
比较两个日期差可写可不写,因为这里日期是连续的。
选出所有 bonus < 1000
员工的 name
及其 bonus
create table employee (
empId int,
name varchar(255),
supervisor int,
salary int
);
insert into employee values
(1, 'John', 3, 1000),
(2, 'Dan', 3, 1000),
(3, 'Brad', null, 1000),
(4, 'Thomas', 3, 1000);
create table bonus(
empId int,
bonus int
);
insert into bonus values
(2, 500),
(4, 2000);
select name, bonus from employee left join bonus on employee.empId = bonus.empId
where ifnull(bonus, 0) < 1000;
将 employee
和 bonus
通过 empId
左连,筛选出 bonus
小于 1000
的数据
判断 null
的方法:
ifnull()
isnull()
is null
写一个查询语句实现对大洲( continent
)列的 透视表 操作,使得每个学生按照姓名的字母顺序依次排列在对应的大洲下面。输出的标题应依次为美洲( America
)、亚洲( Asia
)和欧洲( Europe
)。
create table student (
name varchar(50),
continent varchar(7)
);
insert into student (name, continent) values
('Jane', 'America'),
('Pascal', 'Europe'),
('Xi', 'Asia'),
('Jack', 'America');
select America, Asia, Europe from (
select
name as America,
row_number() over(order by name) as rn
from student where continent = 'America'
) temp1 left join (
select
name as Europe,
row_number() over(order by name) as rn
from student where continent = 'Europe'
) temp2 on temp1.rn = temp2.rn left join (
select
name as Asia,
row_number() over(order by name) as rn
from student where continent = 'Asia'
) temp3 on temp1.rn = temp3.rn;
对 continent
实现透视操作,就是说把 continent
的每一行变成列,再合并重复的列。
要实现透视操作, continent
每个值按照 America
、 Europe
、 Asia
进行分类并排序。
排序可以使用 row_number()
。
将 America
、 Europe
、 Asia
值作为临时表 temp1
、 temp2
、 temp3
,使用左连依次将这三个表连接,连接条件是各表的 rn
select
America, Asia, Europe
from (select @america:=0, @europe:=0, @asia:=0) as init, (
select
name as America,
@america:= @america + 1 as america_id
from student where continent = 'America' order by name
) t1 left join (
select
name as Europe,
@europe:= @europe + 1 as europe_id
from student where continent = 'Europe' order by name
) t2 on america_id = europe_id left join (
select
name as Asia,
@asia:= @asia + 1 as asia_id
from student where continent = 'Asia' order by name
) t3 on america_id = asia_id;
方法二和方法一是一样的思路,这里的排序用的是变量。
方法一和方法二都有一个问题:需要知道哪个大洲的人数最多,人数多的表作为基准表,别的表与它相连,也就是说表连接的方式是从大到小。
select
max(case continent when 'America' then name else null end) America,
max(case continent when 'Asia' then name else null end) Aisa,
max(case continent when 'Europe' then name else null end) Europe
from (
select
*,
row_number() over(partition by continent order by name) as rn
from student
) temp group by rn;
方法三解决了需要先知道哪个大洲人数最多的问题。
使用 row_number()
按照 continent
分组并排序,序号为 rn
,作为临时表 temp
查询临时表 temp
按照 rn
分组
每一列使用 case ... when ... then ... end
查询出 America
、 Asia
、 Europe
,因为要使用到 group by
,所以需要使用到聚合函数, max
和 min
都可以。
无论 person
是否有地址信息,都需要基于上述两表提供 person
的一下信息: FirstName
, LastName
, City
, State
create table person (
personId int,
firstName varchar(255),
lastName varchar(255)
);
create table address (
addressId int primary key,
personId int,
city varchar(255),
state varchar(255)
);
select FirstName, LastName, City, State from Person
left join Address on Person.PersonId = Address.PersonId;
使用左连链接两表 Person
、 Address
,连接条件是 personId
。
编写一个 SQL 查询报告大国的国家名称、人口和面积。
满足下述任一条件之一,记为大国:
create table world (
name varchar(255),
continent varchar(255),
area int,
population int,
gdp bigint
);
insert world values
('afghanistan', 'asia', 652230, 25500100, 2034000000),
('albania', 'europe', 28748, 2831741, 12960000000),
('algeria', 'africa', 2381741, 37100000, 188681000000),
('andorra', 'europe', 468, 78115, 3712000000),
('angola', 'africa', 1246700, 20609294, 100990000000);
select name, population, area from world where
area >= 3000000 or population >= 25000000;
使用 or
连接两个条件
select name, population, area from world where area >= 3000000
union
select name, population, area from world where population >= 25000000;
将两个条件分别查询,使用 union
将两次查询连接起来。
经理也属于员工。每个员工都有一个 Id,此外还有一列对应员工的经理的 Id
。查询收入超过他们经理的员工的姓名
create table employee (
id int,
name varchar(255),
salary int,
managerId int
);
insert into employee values
(1, 'Joe', 70000, 3),
(2, 'Henry', 80000, 4),
(3, 'Sam', 60000, null),
(4, 'Max', 90000, null);
select employee.name from employee left join employee e
on employee.managerId = e.id
where employee.salary > e.salary;
managerId
是经理 id
,同时经理也是员工,也就是说没有 managerId
是普通员工,有 managerId
的是经理。
所以将 employee
自连接,连接条件是 employee.managerId = e.id
,就可以把普通员工和经理连接起来了。
然后在筛选出 employee.salary > e.salary
的员工就行了。
找出影片中非 boring
并且 id
为奇数的影片,结果按照等级 rating
排序
create table cinema (
id int,
movie varchar(255),
description varchar(255),
rating float
);
insert into cinema values
(1, 'War', 'great 3D', 8.9),
(2, 'Science', 'fiction', 8.5),
(3, 'irish', 'boring', 6.2),
(4, 'Ice song', 'Fantacy', 8.6),
(5, 'House card', 'Insteresting', 9.1);
select * from cinema where description != 'boring' and mod(id, 2) = 1
order by rating desc;
如何计算奇数呢?
mod(x, 2) = 1
,如果余数是 1
就是奇数。power(-1, x) = -1
, 如果结果是 -1
就是奇数x % 2 = 1
,如果余数是 1
就是奇数。x & 1 = 1
,如果是 1
就是奇数x regexp '[1, 3, 5, 7, 9]$' = 1
如果为 1
就是奇数x>>1<<1 != x
如果右移一位在左移一位不等于原值,就是奇数查找推荐人的编号不是 2
create table customer(
id int,
name varchar(255),
referee_id int
);
insert into customer values
(1, 'Will', null),
(2, 'Jane', null),
(3, 'Alex', 2),
(4, 'Bill', null),
(5, 'Zack', 1),
(6, 'Mark', 2);
select name from customer where ifnull(referee_id, 0) != 2 or referee_id != 2;
这题的主要考点在于 SQL 中判断 null
的方法
判断 referee_id
是 null
的方法:
is null
isnull()
ifnull()
any
、 all
、 some
是子查询关键词之一,必须与一个比较操作符进行一起使用。
any
和子查询返回的列中 任一值 比较为 true
则返回为 true
。all
和子查询返回的列中 所有值 比较为 true
则返回为 true
。some
的用法和 any
一样,在 !=
的场景中,用 any
容易产生误解,用 some
更容易理解create table t1 (id int, value int);
create table t2 (id int, value int);
insert into t1 values(1, 10), (2, 300), (3, 40), (4, 60), (5, 70), (6, 80);
insert into t2 values(1, 100), (2, 300), (3, 40), (4, 600), (5, 70), (6, 800);
all
方法select * from t1 where value <= all(
select value from t2
);
t2
表中所有的 value
,结果为 (100, 300, 40, 600, 70, 800)
t1
表中筛选 value <= all(100, 300, 40, 600, 70, 800)
value = 10
,它小于等于 (100, 300, 40, 600, 70, 800)
里所有的值,结果为 true
value = 300
它没有小于等于 (100, 300, 40, 600, 70, 800)
里所有的值,结果为 false
true
null
,结果为 false
,也就是说查不出结果null
,结果为 false
,和结果为空不是一个概念any
方法select * from t1 where value <= any(
select value from t2
);
t2
表中所有的 value
,结果为 (100, 300, 40, 600, 70, 800)
t1
表中筛选 value <= (100, 300, 40, 600, 70, 800)
value = 10
,它小于等于 (100, 300, 40, 600, 70, 800)
里任一的值,结果为 true
value = 300
它没有小于等于 (100, 300, 40, 600, 70, 800)
里任一的值,结果为 true
false
,也就是说查不出结果null
,结果为 false
some
用法select * from t1 where value != some(
select value from t2
);
t1
表中有部分 value
与 t2
表中的 value
不相等
如果用 any
就会理解成:t1
表中的 value
与 t2
表中的任意 value
不相等。
他们结果是一样的。
查找 person
表中所有重复的电子邮箱
create table person (
id int,
email varchar(255)
);
insert into person values(1, '[email protected]'), (2, '[email protected]'), (3, '[email protected]');
select email from person group by email having count(email) > 1;
通过 group by
对 email
分组,在使用 having
将重复的 email
筛选出来。
select t.email from (
select email, count(email) num from person group by email
) t where t.num > 1;
将 email
分组后计算出相同 email
的数量作为临时表,筛选出 num > 1
的邮箱
with temp as (
select email, count(email) num from person group by email
)
select email from temp where num > 1;
使用 with
建立临时表,和方法二一样
select distinct person.email from person
left join person temp on person.email = temp.email
where person.id != temp.id;
使用 left join
自连,连接条件是 person.eamil = temp.email
并且通过 where
把 person.id != temp.id
筛选出来,最后再通过 distinct
去重
一条完整的 select
语句执行顺序:
from
子句组装数据(包括 on
连接)where
子句进行条件筛选group by
分组having
筛选分组select
的字段order by
排序limit
筛选筛选过去一年订单总量少于 10 本的书籍,不考虑上架不满一个月(时间:2019-06-23)的书籍。
create table books (
book_id int,
name varchar(255),
available_from date
);
insert into books values
(1, 'Kalila And Demna', '2010-01-01'),
(2, '28 Letters', '2012-05-12'),
(3, 'The Hobbit', '2019-06-10'),
(4, '13 Reasons Why', '2019-06-01'),
(5, 'The Hunger Games', '2008-09-21');
create table orders (
order_id int,
book_id int,
quantity int,
dispatch_date date
);
insert into orders values
(1, 1, 2, '2018-07-26'),
(2, 1, 1, '2018-11-05'),
(3, 3, 8, '2019-06-11'),
(4, 4, 6, '2019-06-05'),
(5, 4, 5, '2019-06-20'),
(6, 5, 9, '2009-02-02'),
(7, 5, 8, '2010-04-13');
这题目有 2
个问题容易忽略:
orders
表中没有销量的书orders
表中在指定时间内没有销量的书
books
左连 orders
,容易将 dispatch_date
的判断放在 where
中,以至于筛选不出指定时间内没有销量的书这题的方法有很多,都是 books
连接 orders
,却别是在 ①、②、③、④ 写法不一样
select book_id, name from ① left join ② on ③ where ④;
select books.book_id, books.name from books left join (
select book_id, sum(quantity) as total_sales
from orders where datediff('2019-06-23', dispatch_date) < 365
group by book_id
) as temp on books.book_id = temp.book_id
where datediff('2019-06-23', available_from) > 30
and (total < 10 or isnull(total_sales));
select temp1.book_id, temp1.name from (
select book_id, name from books
where datediff('2019-06-23', available_from) > 30
) as temp1 left join (
select book_id, sum(quantity) as total_sales from orders
where datediff('2019-06-23', dispatch_date) < 365
group by book_id
) as temp2 on temp1.book_id = temp2.book_id
where ifnull(total_sales, 0) < 10;
orders
进行分组dispatch_date
的筛选就可以放在 orders
中了ifnull()
isnull()
is null
select books.book_id, books.name from books left join orders
on books.book_id = orders.book_id
and ifnull(datediff('2019-06-23', dispatch_date), 0) < 365
where datediff('2019-06-23', available_from) > 30
group by books.book_id, name having ifnull(sum(quantity), 0) < 10;
select temp1.book_id, temp1.name from (
select book_id, name from books
where datediff('2019-06-23', available_from) > 30
) as temp1 left join orders on temp1.book_id = orders.book_id
and ifnull(datediff('2019-06-23', dispatch_date), 0) < 365
group by temp1.book_id, temp1.name having ifnull(sum(quantity), 0) < 10;
having
筛选dispatch_date
要放在 on
当中,不能放在后面的 where
中查找每个公司的薪水中位数(需要不使用内置函数)
create table employee (
id int,
company varchar(10),
salary int
)
insert into employee values
(1, 'A', 2341),
(2, 'A', 341),
(3, 'A', 15),
(4, 'A', 15314),
(5, 'A', 451),
(6, 'A', 513),
(7, 'B', 15),
(8, 'B', 13),
(9, 'B', 1154),
(10, 'B', 1345),
(11, 'B', 1221),
(12, 'B', 234),
(13, 'C', 2345),
(14, 'C', 2645),
(15, 'C', 2645),
(16, 'C', 2652),
(17, 'C', 65);
select id, company, salary from (
select
id, company, salary,
row_number() over(partition by company order by salary) as 排名,
count(id) over(partition by company) as total
from employee
) as temp
where temp.排名 in (floor((total + 1) / 2), floor((total + 2) / 2));
row_number()
计算排名,并按照 company
分组, salary
升序company
分组,并计算总数floor((total + 1) / 2)
和 floor((total + 2) / 2)
, floor
是想下取整
total = 6
,中位数是 3
和 4
,这里计算的结果正是 3
和 4
total = 5
,中位数是 3
,这里计算的两个值分别是 3
和 3
where 排名 >= total / 2 and 排名 <= total / 2 + 1
total = 6
,中位数是 3
和 4
, 排名 ≥ 3 and 排名 ≤ 4
,筛选出来的是 3
和 4
total = 5
,中位数是 3
, 排名 ≥ 2.5 and 排名 ≤ 3.5
,筛选出来的就是 3
select id, company, salary from (
select
id,
company,
salary,
if(@prev = company, @r:=@r+1, @r:=1) as 排名,
@prev:=company
from employee, (select @r:=0, @prev:=0) init,
order by company, salary, id
) as temp1 join (
select count(*) as total, company from employee group by company
) as temp2 using(company) where 排名 >= total / 2 and 排名 <= total / 2 + 1;
和方法一的思路一样,这里是用变量来实现 salary
排名
with temp as (
select e1.id from employee e1 join employee e2 using(company)
group by e1.id
having sum(e1.salary >= e2.salary) >= count(e1.id) / 2
and sum(e1.salary <= e2.salary) >= count(e1.id) / 2
)
select id, company, salary from employee where exists (
select id from temp where employee.id = temp.id
);
**思路:**将每个人和公司的其他所有人一一比较,将 employee
通过 company
自连,并且按照 e1.id
进行分组
筛选:
sum(e1.salary >= e2.salary) >= count(e1.id) / 2
A
公司为例, A
公司有 6
名员工,所以通过 company
连接后,一共有 36
条数据,因为每一条数据都要和自身进行连接,如下图所示。
e1.id | e1.salary | e2.id | e2.salary |
---|---|---|---|
6 | 513 | 1 | 2341 |
5 | 451 | 1 | 2341 |
4 | 15314 | 1 | 2341 |
3 | 15 | 1 | 2341 |
2 | 341 | 1 | 2341 |
1 | 2341 | 1 | 2341 |
6 | 513 | 2 | 341 |
... | ... | ... | ... |
group by e1.id
分组后,就去掉了重复的 e1.id
sum(e1.salary >= e2.salary)
将 e1.id.salary
和每个 e2.id.salary
比较,计算出 e1.id.salary
大于等于 e2.id.salary
的有几个e1.id
分组,所以 count(e1.id)
计算出有多少个 id
,也就是说和几个人进行比较(或者说是公司的总人数)salary(e1.salary <= e2.salary) >= count(e1.id) / 2
sum(...)
这步计算的就比总人数的一半要大于,也就是 sum(...) > count(e1.id) / 2
sum(...)
这步计算的就比总人数的一半要小于,也就是 sum(...) < count(e1.id) / 2
sum(...)
这步计算的就等于总人数的一半,也就是 sum(...) = count(e1.id) / 2
无法去除最后重复的中位数,因为这里是按照员工 id
进行分组的。
在 MySQL 8.0
中使用 group by
需要和 select
的字段一致,所以当要查看连接后表中其他字段时,可以用 any_value()
理不清思路时可以把筛选条件放到 select
中,查询出来在比对自己的思路,比如说 sum(e1.salary >= e2.salary)
和 count(e1.id)
select
sum(e1.salary >= e2.salary),
count(e1.id),
e1.id,
any_value(e1.salary),
any_value(e1.company),
any_value(e2.id),
any_value(e2.salary),
any_value(e2.company)
from employee e1 join employee e2 using(company)
group by e1.id
select
any_value(e1.id) as id,
e1.company as company,
e1.salary as salary
from employee e1 left join employee e2 using(company)
group by e1.company, e1.salary
having sum(
case when e1.salary = e2.salary then 1 else 0 end
) >= abs(sum(sign(e1.salary - e2.salary)))
order by id;
**思路:**将每个人和公司的其他所有人一一比较,将 employee
通过 company
自连,并且按照 e1.company
和 e1.salary
进行分组
筛选:
sum(case when e1.salary = e2.salary then 1 else 0 end)
abs(sum(sign(e1.salary - e2.salary)))
sign
用来确定一个数是正数、负数、还是零,这里以 A
公司的 id = 1
的员工为例
e1.salary=2341
,e2.salary=451
,sign(e1.salary-e2.salary)
结果为 1
e1.salary=2341
,e2.salary=15314
,sign(e1.salary-e2.salary)
结果为 -1
e1.salary=2341
,e2.salary=15
,sign(e1.salary-e2.salary)
结果为 1
e1.salary=2341
,e2.salary=341
,sign(e1.salary-e2.salary)
结果为 1
e1.salary=2341
,e2.salary=2341
,sign(e1.salary-e2.salary)
结果为 0
e1.salary=2341
,e2.salary=513
,sign(e1.salary-e2.salary)
结果为 1
e1.id | e1.salary | e2.id | e2.salary |
---|---|---|---|
1 | 2341 | 5 | 451 |
1 | 2341 | 4 | 15314 |
1 | 2341 | 3 | 15 |
1 | 2341 | 2 | 341 |
1 | 2341 | 1 | 2341 |
1 | 2341 | 6 | 513 |
2 | 341 | 6 | 513 |
... | ... | ... | ... |
sum()
对上面的 sign(...)
进行求和为 4
abs()
求出 sum(...)
的绝对值sign(e1.salary-e2.salary)
大于 1
sign(e1.salary-e2.salary)
小于 1
sign(e1.salary-e2.salary)
等于 1
(ps:如果有几个人的工资相等,并且是中位数,那么这里 1
就是对应的工资相等的人数)order by id
对 id
进行升序排序sum(case when e1.salary = e2.salary then 1 else 0 end) >= abs(sum(sign(e1.salary - e2.salary)))
这里用 >=
是因为如果有几个人工资相等时 sum(case when e1.salary = e2.salary then 1 else 0 end)
会大于工资相等的人数sum(case when e1.salary = e2.salary then 1 else 0 end)
和sum(sign(e1.salary - e2.salary))
查询出来,理清思路
select
sum(case when e1.salary = e2.salary then 1 else 0 end),
sum(sign(e1.salary - e2.salary)),
any_value(e1.id) as id,
e1.company as company,
e1.salary as salary,
any_value(e2.id),
any_value(e2.company),
any_value(e2.salary)
from employee e1 left join employee e2 using(company)
group by e1.company, e1.salary
order by id;
编写一个 SQL 查询,按产品 product_id
来统计每个产品的销售总量。
sales
表中的主键是 sale_id
外键是 product_id
product
表中的主键是 product_id
输出字段 product_id
和 total_quantity
CREATE TABLE sales (
sale_id INT,
product_id INT,
year INT,
quantity INT,
price INT
);
INSERT INTO sales ( sale_id, product_id, year, quantity, price ) VALUES
( 1, 100, 2008, 10, 5000 ),
( 2, 100, 2009, 12, 5000 ),
( 7, 200, 2011, 15, 9000 );
CREATE TABLE product (
product_id INT,
product_name VARCHAR ( 10 )
);
INSERT INTO product ( product_id, product_name ) VALUES
( 100, 'Nokia' ),
( 200, 'Apple' ),
( 300, 'Samsung' );
sales
表中就有每个产品的销量 quantity
,而且输出的字段也都在 sales
表中,所以这里 product
表没有作用。
select
product_id,
sum(quantity) total_quantity
from sales group by product_id;
group by
对 product_id
分组sum()
计算 quantity
select
distinct product_id,
sum(quantity) over(partition by product_id) total_product
from sales;
product_id
进行分组sum()
计算 quantity
编写 SQL 语句,对于每个员工,查询他除最近一个月(即最大月)之外,剩下每个月的近三个月的累计薪水(不足三个月也要计算)。
结果请按 id
升序,然后按 month
降序显示。
create table employee(
id int,
month int,
salary int
)
insert into employee values
(1, 1, 20),
(2, 1, 20),
(1, 2, 30),
(2, 2, 30),
(3, 2, 40),
(1, 3, 40),
(3, 3, 60),
(1, 4, 60),
(3, 4, 70);
select
e1.id,
e1.month,
(ifnull(e1.salary, 0) + ifnull(e2.salary, 0) + ifnull(e3.salary, 0)) salary
from employee e1
left join employee e2 on e2.id = e1.id and e2.month = e1.month - 1
left join employee e3 on e3.id = e1.id and e3.month = e1.month - 2
join (
select id, max(month) max_month from employee group by id
) e4 on e4.id = e1.id and e4.max_month > e1.month
order by e1.id, e1.month desc;
因为要累计计算最近三个月的薪资,所以就要连续自连三次,自连的条件是 id
和 month - 1
和 month - 2
select * from employee e1
left join employee e2 on e2.id = e1.id and e2.month = e1.month - 1
left join employee e3 on e3.id = e1.id and e3.month = e1.month - 2
order by e1.id, e1.salary desc;
这里如果使用 e2.month - 1 = e1.month
和 e3.month - 2 = e1.month
会有问题,因为我们这里最后会计算 e1.salary + e2.salary + e3.salary
。
这里以 id = 1
举例:
这是 e2.month = e1.month - 1
和 e3.month = e1.month - 2
的结果
e1.id | e1.month | e1.salary | e2.id | e2.month | e2.salary | e3.id | e3.month | e3.salary |
---|---|---|---|---|---|---|---|---|
1 | 4 | 60 | 1 | 3 | 40 | 1 | 3 | 30 |
1 | 3 | 40 | 1 | 2 | 30 | 1 | 2 | 20 |
1 | 2 | 30 | 1 | 1 | 20 | null | null | null |
1 | 1 | 20 | null | null | null | null | null | null |
这是 e2.month - 1 = e1.month
和 e3.month - 2 = e1.month
的结果
e1.id | e1.month | e1.salary | e2.id | e2.month | e2.salary | e3.id | e3.month | e3.salary |
---|---|---|---|---|---|---|---|---|
1 | 4 | 60 | null | null | null | null | null | null |
1 | 3 | 40 | 1 | 4 | 60 | null | null | null |
1 | 2 | 30 | 1 | 3 | 40 | 1 | 4 | 60 |
1 | 1 | 20 | 1 | 2 | 30 | 1 | 3 | 40 |
使用 e2.month - 1 = e1.month
的问题是, e2
表的第二行不是 2
月份的数据,而是 4
月份的数据。
找出每个 id
最大的月份,存储为临时表 e4
select id, max(month) max_month from employee group by id;
将第一步的结果使用 join
连接 e4
,筛选条件 id
和 e1.month < e4.max_month
select * from employee e1
left join employee e2 on e2.id = e1.id and e2.month = e1.month - 1
left join employee e3 on e3.id = e1.id and e3.month = e1.month - 2
join (
select id, max(month) max_month from employee group by id;
) e4 on e4.id = e1.id and e4.max_month > e1.month
order by e1.id, e1.salary desc;
这里为什么使用 left join
而不是使用 join
,因为要去除掉最大月份,使用 e1.month < e4.max_month
时会有一条数据是 null
,使用 join
能够去除掉这条数据。
使用 ifnull
判断是否每张表的 salary
是否为 null
,并且相加,就是每个月薪资累加的和。
(ifnull(e1.salary, 0) + ifnull(e2.salary, 0) + ifnull(e3.salary, 0)) salary
有个问题,如果月份不连续,比如下面的数据 7月
和 4月
不连续,使用这个方法计算 7月
的累计薪资是 7月
、 6月
、 5月
,不会计算到 4月
的薪资。
insert into employee values
(1,1,20),
(1,2,30),
(1,3,40),
(1,4,60),
(1,7,90),
(1,8,90),
(2,1,20),
(2,2,30),
(3,2,40),
(3,3,60),
(3,4,70);
select
id, month,
sum(salary) over(partition by id
order by month desc
range between 0 preceding and 2 following
) salary
from employee where (id, month) not in (
select id, max(month) month from employee group by id
);
select
id, month,
sum(salary) over(partition by id
order by month
range 2 preceding
) salary
from employee where (id, month) not in (
select id, max(month) month from employee group by id
)
order by id, month desc;
select
id, month,
sum(salary) over(partition by id order by month range 2 preceding) salary
from (
select
id, month, salary,
row_number() over(partition by id order by month desc) rk
from employee
) e1 where rk >= 2
order by id, month desc;
这三种写法都是一样的,上面两种是通过 not in
的方式去除掉当前最大月份,第三种是通过使用 row_number()
窗口函数去掉最大月份。
select
e1.id id, e1.month month,
sum(e2.salary) salary
from employee e1, employee e2
where e1.id = e2.id
and e1.month >= e2.month
and e1.month < e2.month + 3
and (e1.id, e1.month) not in (
select id, max(month) month from employee group by id
)
group by e1.id, e1.month
order by e1.id, e1.month desc;
两表查询 employee e1, employee e2
查询条件 e1.id = e2.id
这好理解,e1.month >= e2.month and e1.month < e2.month + 3
怎么理解呢,最近三个月就加 3 吗?这就要配合 sum(e2.salary)
来理解了
我们先看第一个筛选条件 e1.id = e2.id
,只需要两个相同 id
的数据
select
e1.id, e1.month,
any_value(e1.salary),
any_value(e2.id),
any_value(e2.month),
any_value(e2.salary),
sum(e2.salary) salary,
group_concat(e2.salary),
from employee e1, employee e2
where e1.id = e2.id
group by e1.id, e1.month
order by e1.id, e1.month desc;
从输出结果可以看出 e1.id = 1, e1.month = 4
的 salary
是由 group_concat(e2.salary)
的 4 个数相加,这 4 个值是 e2.id = 1, e2.month = 4,3,2,1
的 salary
。
增加筛选条件 e1.month >= e2.month
,累加当前月份的 salary
和之前所有所有月份的 salary
select
e1.id, e1.month,
any_value(e1.salary),
any_value(e2.id),
any_value(e2.month),
any_value(e2.salary),
sum(e2.salary) salary,
group_concat(e2.salary),
from employee e1, employee e2
where e1.id = e2.id
and e1.month > e2.month
group by e1.id, e1.month
order by e1.id, e1.month desc;
从结果中可以看出 e1.id = 1, e1.month = 2
的 salary
由 group_concat(e2.salary)
的 2 个数相加,这两个值是有 e2.id = 1, e2.month = 2,1
的 salary
。
增加筛选提条件 e1.month < e2.month + 3
,只累加当前月份以及前 2 个月的 salary
,这一步再加个 group_concat(e2.month)
,帮助我们更好理解。
select
e1.id, e1.month,
any_value(e1.salary),
any_value(e2.id),
any_value(e2.month),
any_value(e2.salary),
sum(e2.salary) salary,
group_concat(e2.salary),
group_concat(e2.month)
from employee e1, employee e2
where e1.id = e2.id
and e1.month >= e2.month
and e1.month < e2.month + 3
group by e1.id, e1.month
order by e1.id, e1.month desc;
加 e1.month < e2.month + 3
, group_concat(e2.month)
的结果
加了 e1.month < e2.month + 3
, group_concat(e2.month)
的结果
通过第二步筛选,得到所有的数据 e1.month
一定大于等于 e2.month
,因为要的结果是最近三个月的 salary
累加,所以上面一张图中的 1,4,2,3
是不符合要求的。如何把这些月份过滤掉用的就是 e1.month < e2.month + 3
,比如当前月份是 8
月份,那个最近的 3 个月是 8,7,6
, 8<8+3, 8<7+3, 8<6+3
。
去除最大月份,分组查询一次就能得到。
找出所有从不订购任何东西的客户
create table customers (
id int primary key auto_increment,
name varchar(255)
);
insert into customers (name) values('Joe'), ('Henry'), ('Sam'), ('Max');
create table orders (
id int primary key auto_increment,
customerId int
);
insert into orders (customerId) values(3),(1);
select name from customers left join orders
on customers.id = orders.customerId where isnull(customerId);
使用 left join
连接 customers
和 orders
连接条件是 customers.id = orders.customersId and isnull(customersId)
select name from customers where id not in (
select customerId from orders
);
使用 not in
查出不在这些结果中的数据。
select name from customers where not exists (
select customerId from orders where customerId = customers.id
);
使用 not exists
代替 not in
找出每个部门工资最高的员工。
create table employee (
id int,
name varchar(255),
salary int,
departmentId int
);
insert into employee values
(1,'Joe', 70000, 1),
(2,'Jim', 90000, 1),
(3,'Henry', 80000, 2),
(4,'Sam', 60000, 2),
(5,'Max', 90000, 1);
create table department(id int, name varchar(255));
insert into department values(1, 'it'), (2, 'sales');
select department.name, employee.name, salary from employee
join department on employee.departmentId = department.id
where (departmentId, salary) in (
select departmentId, max(salary) from employee group by departmentId
);
两个字段也可以用 in
。
with temp as (
select department.name as department, employee.name as employee, salary,
dense_rank() over(partition by departmentId order by salary desc) as 排名
from employee left join department on employee.departmentId = department.id
) select department, employee, salary from temp where 排名 = 1;
with
建立临department
和 employee
使用 dense_rank()
对 salary
进行排序。
partition by
的作用是分区求出每一薪资发放日,每个部门的平均薪资与公司的平均薪资比较的结果(高/低/相同)
create table salary (id int, employee_id int, amount int, pay_date date);
insert into salary (id, employee_id, amount, pay_date) values
('1', '1', '9000', '2017/03/31'),
('2', '2', '6000', '2017/03/31'),
('3', '3', '10000', '2017/03/31'),
('4', '1', '7000', '2017/02/28'),
('5', '2', '6000', '2017/02/28'),
('6', '3', '8000', '2017/02/28');
create table employee (employee_id int, department_id int);
insert into employee (employee_id, department_id) values
('1', '1'),
('2', '2'),
('3', '2');
select
distinct date_format(pay_date, '%Y-%m') as pay_month,
department_id,
(case when avg_department > avg_company then 'higher'
when avg_department < avg_company then 'lower'
else 'same'
end) as comparison
from (
select
pay_date,
department_id,
avg(amount) as avg_department
from salary join employee using(employee_id)
group by pay_date, department_id
) as temp1 join (
select pay_date, avg(amount) as avg_company from salary group by pay_date
) as temp2 using(pay_date)
--- 等价于
select
pay_month,
department_id,
(case when avg_department > avg_company then 'higher'
when avg_department < avg_company then 'lower'
else 'same'
end) as comparison
from (
select
date_format(pay_date, '%Y-%m') as pay_month,
department_id,
avg(amount) as avg_department
from salary join employee using(employee_id)
group by pay_month, department_id
) as temp1 join (
select
date_format(pay_date, '%Y-%m') as pay_month,
avg(amount) as avg_company
from salary group by pay_month
) as temp2 using(pay_month)
这题有两个重点:
计算部门每个月的平均薪资,将 salary
和 employee
用 employee_id
连接,并且按照 ,计算出部门薪资平均值 avg_department
, pay_month
和 department_id
进行分组,将它作为临时表 temp1
计算公司每个月的平均薪资比较简单,直接对 salary
表按照 pay_date
进行分组,并且计算出公司薪资平均值 avg_company
,将它作为临时表 temp2
将 temp1
和 temp2
用 pay_date
连接起来,使用 case ... when ... end
语句比较avg_department
和 avg_company
的大小后输出 same
、 higher
、 lower
因为这里输出的都是日期 date
,所以这里要使用 date_format()
对它进行日期格式化。
这里要注意一点的是因为 temp1
和 temp2
都是用 date
分组的,而最后查出来的数据只需要月份,所以这里可能会出现重复的数据,需要在最后使用 distinct
去重,或者呢在 temp1
和 temp2
是就直接使用 month
进行分组。
查询至少有一个订单金额大于 500 的客户数量。
create table store (
bill_id int,
customer_id int,
amount int
);
insert into store values
(6, 1, 549),
(8, 1, 834),
(4, 2, 394),
(11, 3, 657),
(13, 3, 257);
select count(distinct customer_id) as rich_count from store where amount > 500;
500
的数据customer_id
去重计数select count(*) as rich_count from (
select distinct customer_id from store
group by customer_id having max(amount) > 500
) as temp;
customer_id
分组,并筛选出 amount
大于 500
的客户,作为临时表 temp
temp
计数with temp as (
select distinct customer_id from store
group by customer_id having max(amount) > 500
) select count(*) as rich_count from temp
方法二的另一种写法,使用 with
创建临时表
select count(*) as rich_count from (
select
distinct customer_id,
max(amount) over(partition by customer_id) as `max`
from store
) as temp where `max` > 500;
temp
temp
筛选出大于 500
的进行计数删除 person
表中所有重复的电子邮箱,重复的邮箱只保留 id
最小的那个。
create table person (
id int,
email varchar(255)
)
insert into person values
(1, '[email protected]'),
(2, '[email protected]'),
(3, '[email protected]');
delete person from person
join person p2
on person.email = p2.email and person.id > p2.id;
自连接,连接条件是 person.email = p2.email
筛选出 person.id > p2.id
,这个是要删除的。
delete person from person, person p2
where person.email = p2.email and person.id > p2.id;
和方法一一样。
delete person from person,
(
select min(id) as id, email from person group by email having count(email) > 1) as p2
where person.email = p2.email and person.id != p2.id;
person
安装 email
进行分组,并用 having
筛选出重复的最小 id
person
进行一起联查,条件是 person.email = p2.email
并且不能和临时表的 id
相同,结果就是要删除的数据。A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.