那些年，背過的SQL題

數據分析那些事

51 min readApr 15, 2024

大家好，我是數據君，週末看到朋友轉發的⼀篇文章：《那些年，背過的SQL面試題》。

感嘆失敗的原因可能有很多，而做成的道路只有⼀條，那就是不斷積累。純手工的8291字的SQL面試題總結分享給初學者，俗稱八股文，期待對新手有所幫助。

視窗函式題

視窗函式其實就是根據當前資料, 計算其在所在的組中的統計資料。

視窗函式和group by得區別就是，groupby的聚合對每一個組只有一個結果，但是視窗函式可以對每一條資料都有一個結果。

商品類別資料集

一. 從資料集中得到每個類別收入第一的商品和收入第二的商品。

思路：計算每一個類別的按照收入排序的序號，然後取每個類別中的前兩個資料。

總結答案：

SELECT
  product,
  category,
  revenue
FROM (
  SELECT
    product,
    category,
    revenue,
    dense_rank() OVER w as 'rank'
  FROM productRevenue
WINDOW w as (PARTITION BY category ORDER BY revenue DESC)
) tmp
WHERE
  'rank' <= 2;

按照類別進行分組，且每個類別中的資料按照收入進行排序，併為排序過的資料增加編號：

SELECT product,
 category,
 revenue,
 dense_rank() OVER w as 'rank'
FROM productRevenue
WINDOW w as (PARTITION BY category ORDER BY revenue DESC);

2. 根據編號，取得每個類別中的前兩個資料作為最終結果；

二. 統計每個商品和此品類最貴商品之間的差值

目標數據集

總結答案：

SELECT
    product,
    category,
    revenue,
    MAX(revenue) OVER  w - revenue as revenue_difference
FROM productRevenue
WINDOW w as (PARTITION BY category ORDER BY revenue DESC);

首先創建窗口，按照類別進行分組，並對價格倒敘排列；
應用窗口，求出每個組內的價格最大值，對其減去商品的價格，起別名。

用戶表（時長，用戶id）

查詢某一天中時長最高的60% 用戶的平均時長
總結答案：

with aa as(
  select
    *,
    row_number() over(
      order by
        時長 desc
    ) as rank_duration
  from
    表
  where
    package_name = 'com.miHoYo.cloudgames.ys'
    and date = 20210818
)
select
  avg(時長)
from
  aa
where
  rank_duration <= (
    select
      max(rank_duration)
    from
      aa
  ) * 0.6;

這是排名問題，排名問題可以考慮用視窗函式去解決。

將問題拆分為：

1）找出時長前60%的使用者；

2）剔除訪問次數前20%的使用者

首先找某天的資料，按時長降序從高到低進行排名，注意要用row_number，相相等的話也會往後算數：

select
    *,
    row_number() over(
      order by duration desc
    ) as rank_duration
  from
    表
where
    package_name = 'com.miHoYo.cloudgames.ys'
    and date = 20210818;

排完名後，要找出前60%的使用者：

**使用者排名值<=最大的排名值 * 60%**，就是前60%的使用者資料。

最大的排名值透過max(排名)來得到。

把排名結果表作為臨時表，但是要注意的是，臨時表只能用其中的欄位，但是不能當作表來用。所以需要用with as語句將排名作為臨時表。

使用者簽到表

有一張使用者簽到表【t_user_attendence】，標記每天使用者是否簽到（說明：該表包含所有使用者所有工作日的出勤記錄），包含三個欄位：日期【fdate】，使用者 id【fuser_id】，使用者當天是否簽到【fis_sign_in：0 否 1 是】

計算截至當前每個使用者已經連續簽到的天數
計算最近一次未簽到的日期，再用當前日期減去那個日期

select
  fuser_id,
  datediff(CURDATE(), fdate_max) as fcon,
  secutive_days
from
  (
    select
      fuser_id,
      max(fdate) as fdate_max
    from
      t_user_attendence
    where
      fis_sign_in = 0
    group by
      fuser_id
  ) t1;

請計算每個使用者歷史以來最大的連續簽到天數
思路1：把相同數值進行分組再自然連續排序，兩個排序相減得到差值 t，若數值連續，則差值 t 相等。

先按人分組按天進行自然連續排序，再只取簽到部分，按人分組進行自然連續排序，相差得到差值 diff1；再按照差值 diff1 分組計數，得到每人連續簽到的天數，求最大值即可。

SELECT
  fuser_id,
  max(ct) as max_ct
FROM
  (
    SELECT
      fuser_id,
      diff1,
      count(diff1) as ct
    FROM
      (
        SELECT
          *,
          row_number() over (
            PARTITION by fuser_id
            ORDER BY
              fdate
          ) as or2,
          or1 - row_number() over (
            PARTITION by fuser_id
            ORDER BY
              fdate
          ) as diff1
        FROM
          (
            SELECT
              fdate,
              fuser_id,
              fis_sign_in,
              row_number() over (
                partition by fuser_id
                order by
                  fdate
              ) as or1
            from
              t_user_attendence
            order by
              fuser_id,
              fdate
          ) t
        where
          fis_sign_in = 1
      ) t2
    GROUP BY
      fuser_id,
      diff1
  ) t3
GROUP BY
  fuser_id;

思路2：把相同數值進行分組再自然連續排序，兩個排序相減得到差值 t，若數值連續，則差值 t 相等。

SELECT
  log_id, log_date
  max(ct) as max_ct
FROM
  (
    SELECT
      log_id,
      diff1,
      count(diff1) as ct
      log_date
    FROM
      (
        SELECT
          *,
          row_number() over (
            PARTITION by log_id
            ORDER BY
              log_date
          ) as or2,
          or1 - row_number() over (
            PARTITION by log_id
            ORDER BY
              log_date
          ) as diff1
        FROM
          (
            SELECT
              log_id,
              log_date,
              row_number() over (
                partition by log_id
                order by
                  log_date
              ) as or1
            from
              log_info
            order by
              log_id,
              log_date
          ) t
      ) t2
    GROUP BY
      log_id,
      diff1,
      log_date
  ) t3
GROUP BY
  fuser_id, log_date;

使用者行為資訊表

給你一個表，表中有兩列資料：date和user_id，計算次日留存使用者數

1.基礎版本
**答案 — **產出結果：第一列為時間，第二列為次日留存使用者數；（簡單實現）

SELECT  b_time_load, COUNT(DISTINCT case when diff=1 then id else null end)as liucun_num
FROM
(
    SELECT *, TIMESTAMPDIFF(DAY,a_time_load, b_time_load) as diff
    FROM
    (
        SELECT a.id, a.time_load as a_time_load,b.time_load as b_time_load
        FROM user_move as a
        LEFT JOIN user_move as b
        on a.id = b.id
    )as c
)as d
GROUP BY b_time_load;

實現思路：

次日留存使用者數：在今日登入，明天也有登入的使用者數。也就是時間間隔 = 1。

當一個表如果涉及到時間間隔，就需要用到自聯結，也就是將相同的表進行聯結。

第一步：因為要算時間間隔，因此需要先對錶進行自聯結：
select a.使用者id,a.登陸時間,b.登陸時間
from 使用者行為資訊表 as a  
left join 使用者行為資訊表 as b
on a.使用者id = b.使用者id
where a.應用名稱= '相機';
// 根據條件看是否要加最後一句where。
第二步：計算兩個日期的差值--
select *,timestampdiff(day,a.登陸時間,b.登陸時間) as 時間間隔
from c;
第三步：用case選出時間間隔為1的資料：
count(distinct case when 時間間隔=1 then 使用者id else null end) as  次日留存數

2.最佳化點
考慮表的日期分割槽
考慮對使用者的去重，比如一天內登入多次的情況
考慮count distinct的效率，在前面就對diff = 1當作條件過濾，而不選擇加在case when裡做處理；

同上，求次日留存率

留存率 = 新增使用者****中登入使用者數 / 新增使用者數，所以次日留存率 = 次日留存使用者數 / 當日使用者活躍數；
當日活躍使用者數是 count(distinct 使用者 id)，用次日留存使用者數 / 當日使用者活躍數就是次日留存率：

SELECT  b_time_load, COUNT(DISTINCT case when diff=1 then id else null end) as liucun_num, COUNT(DISTINCT case when diff=1 then id else null end)/ count(distinct id) as liucun_rate
FROM
(
    SELECT *, TIMESTAMPDIFF(DAY,a_time_load, b_time_load) as diff
    FROM
    (
        SELECT a.id, a.time_load as a_time_load,b.time_load as b_time_load
        FROM user_move as a
        LEFT JOIN user_move as b
        on a.id = b.id
    )as c
)as d
GROUP BY b_time_load

每天的活躍使用者數

select 登錄時間,count(distinct 使用者id) as 活躍使用者數
from 使用者行為資訊表 
where 應用名稱='相機' 
group by 登陸時間;

三日的留存數，三日留存率, 七日的留存數, 七日留存率
將diff後的數字改為3/7即可。

SELECT  b_time_load, COUNT(DISTINCT case when diff=3 then id else null end) as liucun_num, COUNT(DISTINCT case when diff=3 then id else null end)/ count(distinct id) as liucun_rate
FROM
(
    SELECT *, TIMESTAMPDIFF(DAY,a_time_load, b_time_load) as diff
    FROM
    (
        SELECT a.id, a.time_load as a_time_load,b.time_load as b_time_load
        FROM user_move as a
        LEFT JOIN user_move as b
        on a.id = b.id
    )as c
)as d
GROUP BY b_time_load

給定兩張表訂單表和使用者表：查詢2019年Q1季度，不同性別，不同年齡的成交使用者數，成交量及成交金額

根據性別、年齡進行分組，利用多表連線及聚合函式求出成交使用者數，成交量及成交金額。

select b.性別,b.age,
   count(distinct a.使用者id) as 使用者數,
   count(訂單id),
   sum(a.訂單金額)
from 訂單表 as a 
inner join 使用者表 as b
on a.使用者id = b.使用者id
where a.時間 between '2019-01-01' and '2019-03-31'
group by b.性别,b.age;

給定兩張表：訂單表和使用者表，2019年1–4月產生訂單的使用者，以及在次月的留存使用者數

select a.使用者id,
COUNT(case when TIMESTAMPDIFF(month,a.時間, b.時間)=1 then a.使用者id else null end) as liucun_num
from 訂單表 as a join 訂單表 as b
on a.使用者id = b.使用者id
where a.時間 between '2019-01-01' and '2019-04-30'
group by a.使用者id

給定一個表，表裡兩個欄位：user_id, date_key，找出來今日登入(2021–05–16)且一週內沒有登入的使用者id：

第一步，先把date_key欄位型別處理成日期形式；

第二步，查詢最近一週的使用者登入；

第三步，查詢今日登入的使用者id，不在最近一週登入的id裡的

select distinct user_id, date_format(date_key, '%Y-%m-%d') as login_date
from 使用者行為資訊表
where  date_format(date_key, '%Y-%m-%d') ='2021-05-16'
and user_id not in
(select distinct user_id
from 使用者行為資訊表
WHERE DATEDIFF('2021-05-16',date_key)<=7 and DATEDIFF('2021-05-16',date_key)>0 )
order by user_id;

員工表

有3個表dept(部門表)，emp（員工表），salgrade（薪水等級表）：

dept（DEPTNO，DNAME，LOC）代表（部門編號，部門名稱，位置）

emp（EMPNO，ENAME，JOB，MGR，HIREDATE，SAL，COMM，DEPTNO）代表（員工編號，員工姓名，工作崗位，上級經理編號，入職日期）

salgrade（GRADE，LOSAL，HISAL）代表（薪水級別，最低薪水，最高薪水）

取得每個部門最高薪水的人員名稱

答案

select e.deptno, e.ename, t.maxsal, e.sal
from (
      select e.deptno, max(e.sal) as maxsal
      from emp e
      group by e.deptno
     ) t
join emp e
on t.deptno = e.deptno
where t.maxsal = e.sal
order by e.deptno;

第一步：求出每個部門的最高薪水

select e.deptno, max(e.sal) as maxsal
from emp e
group by e.deptno;                                                                                             
+--------+---------+
| deptno | maxsal  |
+--------+---------+
|     10 | 5000.00 |
|     20 | 3000.00 |
|     30 | 2850.00 |
+--------+---------+

第二步：將以上查詢結果當成一個臨時表

select e.deptno, e.ename, t.maxsal, e.sal
from t
join emp e
on t.deptno = e.deptno
where t.maxsal = e.sal
order by e.deptno;
+--------+-------+---------+---------+
| deptno | ename | maxsal  | sal     |
+--------+-------+---------+---------+
|     10 | KING  | 5000.00 | 5000.00 |
|     20 | SCOTT | 3000.00 | 3000.00 |
|     20 | FORD  | 3000.00 | 3000.00 |
|     30 | BLAKE | 2850.00 | 2850.00 |
+--------+-------+---------+---------+
最後把t換下。

哪些人的薪水在部門平均薪水之上

答案

select t.detpno, e.ename
from (select e.deptno, avg(e.sal) as avgsal
      from emp e
      group by e.deptno) t
join emp e
on e.deptno = t.deptno
where e.sal > t.avgsal;

第一步：求出每個部門的平均薪水

select e.deptno, avg(e.sal) as avgsal
from emp e
group by e.deptno;
+--------+-------------+
| deptno | avgsal      |
+--------+-------------+
|     10 | 2916.666667 |
|     20 | 2175.000000 |
|     30 | 1566.666667 |
+--------+-------------+

第二步：需要保證員工在這個部門裡，將以上查詢結果當成臨時表t(deptno, avgsal)

select t.detpno, e.ename
from t
join emp e
on e.deptno = t.deptno
where e.sal > t.avgsal;

第三步：把臨時表t進行替換

select t.detpno, e.ename
from (select e.deptno, avg(e.sal) as avgsal
      from emp e
      group by e.deptno) t
join emp e
on e.deptno = t.deptno
where e.sal > t.avgsal;
+--------+-------+
| deptno | ename |
+--------+-------+
|     30 | ALLEN |
|     20 | JONES |
|     30 | BLAKE |
|     20 | SCOTT |
|     10 | KING  |
|     20 | FORD  |
+--------+-------+

取得部門中(所有人的)平均薪水等級

取得部門中所有人的平均薪水的等級
答案

select t.deptno, t.avgsal, s.grade
from (select e.deptno, avg(e.sal) as avgsal
      from emp e
      group by e.deptno) t
join salgrade s
on t.avgsal between s.losal and s.hisal;

第一步：求出每個部門的平均薪水

select e.deptno, avg(e.sal) as avgsal
from emp e
group by e.deptno;
+--------+-------------+
| deptno | avgsal      |
+--------+-------------+
|     10 | 2916.666667 |
|     20 | 2175.000000 |
|     30 | 1566.666667 |
+--------+-------------+

第二步：將以上表作為臨時表t，根據平均薪水在等級表中進行比對

select t.deptno, t.avgsal, s.grade
from t
join salgrade s
on t.avgsal between s.losal and s.hisal;

第三步：把臨時表t替換為子查詢

select t.deptno, t.avgsal, s.grade
from (select e.deptno, avg(e.sal) as avgsal
      from emp e
      group by e.deptno) t
join salgrade s
on t.avgsal between s.losal and s.hisal;
+--------+-------------+-------+
| deptno | avgsal      | grade |
+--------+-------------+-------+
|     30 | 1566.666667 |     3 |
|     10 | 2916.666667 |     4 |
|     20 | 2175.000000 |     4 |
+--------+-------------+-------+

取得部門中所有人的平均的薪水等級

答案

select t.deptno, avg(t.grade) as avgGrade
from (
      select e.deptno, e.ename, s.grade
      from emp e
      join salgrade s
      on e.sal between s.losal and s.hisal
      order by e.deptno
     )
group by t.deptno;

第一步：求出每個人的薪水等級

select xxx
from emp e
join salgrade s
on e.sal between s.losal and s.hisal
order by e.deptno;
+--------+--------+-------+
| deptno | ename  | grade |
+--------+--------+-------+
|     10 | CLARK  |     4 |
|     10 | MILLER |     2 |
|     10 | KING   |     5 |
+--------+--------+-------+

xxx为：e.deptno, e.ename, s.grade
(求得是部門等級值得平均值)

第二步：求出每組薪水的平均值

select t.deptno, avg(t.grade) as avgGrade
from t
group by t.deptno;
+--------+----------+
| deptno | avgGrade |
+--------+----------+
|     10 |   3.6667 |
|     20 |   2.8000 |
|     30 |   2.5000 |
+--------+----------+

不用組函式，取得最高薪水(給出兩種解決方案)

方案一：

select sal
from emp 
order by sal desc limit 1;

方案二：兩個表進行比較

select sal from emp 
where sal not in(
  select distinct a.sal
  from emp a
  join emp b
  on a.sal < b.sal);

取得平均薪水最高的部門的部門編號

答案

select e.deptno, avg(e.sal) as avgsal
from emp e
group by e.deptno
having avgsal = (select max(t.avgsal) as maxAvgSal from 
  (
    select e.deptno, avg(e.sal) as avgsal
    from emp e
    group by e.deptno
  )
);

第一步：求出部門平均薪水

select e.deptno, avg(e.sal) as avgsal
from emp e
group by e.deptno;
+--------+-------------+
| deptno | avgsal      |
+--------+-------------+
|     10 | 2916.666667 |
|     20 | 2175.000000 |
|     30 | 1566.666667 |
+--------+-------------+

第二步：將以上查詢結果當作臨時表t

select max(t.avgsal) as maxAvgSal from t;
+-------------+
| maxAvgSal   |
+-------------+
| 2916.666667 |
+-------------+

第三步：最大的平均值有了，用其做過濾

select e.deptno, avg(e.sal) as avgsal
from emp e
group by e.deptno
having avgsal = (select max(t.avgsal) as maxAvgSal from t);
+--------+-------------+
| deptno | avgsal      |
+--------+-------------+
|     10 | 2916.666667 |
+--------+-------------+

取得平均薪水最高的部門的部門名稱

部門名稱在dept表

select e.deptno, d.dname, avg(e.sal) as avgsal
from emp e
join dept d
on e.deptno = d.deptno
group by e.deptno, d.dname
having avgsal = (select max(t.avgsal) as maxAvgSal from t);

求平均薪水的等級最低的部門的部門名稱

答案

select 
    t.deptno,t.dname,s.grade
from    
    (select 
            e.deptno,d.dname,avg(e.sal) as avgsal
        from
            emp e
        join
            dept d
        on
            e.deptno = d.deptno
        group by
            e.deptno,d.dname)t
join
    salgrade s
on
    t.avgsal between s.losal and s.hisal
where
    s.grade = (select min(t.grade) as minGrade from (select 
                    t.deptno,t.dname,s.grade
                from    
                    (select 
                            e.deptno,d.dname,avg(e.sal) as avgsal
                        from
                            emp e
                        join
                            dept d
                        on
                            e.deptno = d.deptno
                        group by
                            e.deptno,d.dname)t
                join
                    salgrade s
                on
                    t.avgsal between s.losal and s.hisal)t);

第一步：求出部門的平均薪水

select 
    e.deptno,d.dname,avg(e.sal) as avgsal
from
    emp e
join
    dept d
on
    e.deptno = d.deptno
group by
    e.deptno,d.dname;
+--------+------------+-------------+
| deptno | dname      | avgsal      |
+--------+------------+-------------+
|     10 | ACCOUNTING | 2916.666667 |
|     20 | RESEARCH   | 2175.000000 |
|     30 | SALES      | 1566.666667 |
+--------+------------+-------------+

第二步：將以上查詢結果當作臨時表t，與salgrade表進行表連線

select 
    t.deptno,t.dname,s.grade
from    
    (select 
            e.deptno,d.dname,avg(e.sal) as avgsal
        from
            emp e
        join
            dept d
        on
            e.deptno = d.deptno
        group by
            e.deptno,d.dname)t
join
    salgrade s
on
    t.avgsal between s.losal and s.hisal;
+--------+------------+-------+
| deptno | dname      | grade |
+--------+------------+-------+
|     30 | SALES      |     3 |
|     10 | ACCOUNTING |     4 |
|     20 | RESEARCH   |     4 |
+--------+------------+-------+

排序求是不對的，得先求出最低等級。

第三步：將以上查詢結果當成一張臨時表t

select min(t.grade) as minGrade from (select 
                    t.deptno,t.dname,s.grade
                from    
                    (select 
                            e.deptno,d.dname,avg(e.sal) as avgsal
                        from
                            emp e
                        join
                            dept d
                        on
                            e.deptno = d.deptno
                        group by
                            e.deptno,d.dname)t
join
    salgrade s
on
    t.avgsal between s.losal and s.hisal)t;
+----------+
| minGrade |
+----------+
|        3 |
+----------+

第四步：最終

select 
    t.deptno,t.dname,s.grade
from    
    (select 
            e.deptno,d.dname,avg(e.sal) as avgsal
        from
            emp e
        join
            dept d
        on
            e.deptno = d.deptno
        group by
            e.deptno,d.dname)t
join
    salgrade s
on
    t.avgsal between s.losal and s.hisal
where
    s.grade = (select min(t.grade) as minGrade from (select 
                    t.deptno,t.dname,s.grade
                from    
                    (select 
                            e.deptno,d.dname,avg(e.sal) as avgsal
                        from
                            emp e
                        join
                            dept d
                        on
                            e.deptno = d.deptno
                        group by
                            e.deptno,d.dname)t
                join
                    salgrade s
                on
                    t.avgsal between s.losal and s.hisal)t);
+--------+-------+-------+
| deptno | dname | grade |
+--------+-------+-------+
|     30 | SALES |     3 |
+--------+-------+-------+

取得比普通員工（員工代碼沒有在mgr上出現的）的最高薪水還要高的經理人姓名

答案

select ename from emp 
where sal > (
      select max(sal) as maxsal 
      from emp 
      where empno not in(
            select distinct mgr from emp 
            where mgr is not null
            )
      );

第一步：找出普通員工(員工代碼沒有出現在mgr上的)

先找出mgr有哪些人
select distinct mgr from emp;
+------+
| mgr  |
+------+
| 7902 |
| 7698 |
| 7839 |
| 7566 |
| NULL |
| 7788 |
| 7782 |
+------+

第二步：求出普通員工得最高薪水

select max(sal) as maxsal from emp where empno not in(select distinct mgr from emp where mgr is not null);
+---------+
| maxsal  |
+---------+
| 1600.00 |
+---------+

not in不會自動忽略空值，in會自動忽略空值。一旦沒有忽略空值，None就會參與數學運算，結果就變為了None。not in是and， in引數關係是or。

第三步：比普通員工最高薪水還要高的

select ename from emp where sal > (
    select max(sal) as maxsal from emp where empno not in(
        select distinct mgr from emp where mgr is not null
    )
);
+-------+
| ename |
+-------+
| JONES |
| BLAKE |
| CLARK |
| SCOTT |
| KING  |
| FORD  |
+-------+

取得薪水最高的前五名員工

select * from emp order by sal desc limit 0,5;

取得薪水最高的第六到第十名員工

select * from emp order by sal desc limit 5,5;

取得最後入職的5名員工

select * from emp order by hiredate desc limit 5;

取得每個薪水等級有多少員工

答案

select
    t.grade,count(t.ename) as totalEmp
from
    (select 
        e.ename,s.grade 
    from 
        emp e
    join
        salgrade s
    on
        e.sal between s.losal and s.hisal) t
group by
    t.grade;

第一步：查詢出每個員工的薪水等級

select 
    e.ename,s.grade 
from 
    emp e
join
    salgrade s
on
    e.sal between s.losal and s.hisal
order by
    s.grade;
+--------+-------+
| ename  | grade |
+--------+-------+
| JAMES  |     1 |
| SMITH  |     1 |
| ADAMS  |     1 |
| MILLER |     2 |
| WARD   |     2 |
| MARTIN |     2 |
| ALLEN  |     3 |
| TURNER |     3 |
| BLAKE  |     4 |
| FORD   |     4 |
| CLARK  |     4 |
| SCOTT  |     4 |
| JONES  |     4 |
| KING   |     5 |
+--------+-------+

第二步：將以上查詢結果當成臨時表t(ename,grade)

select
    t.grade,count(t.ename) as totalEmp
from
    (select 
        e.ename,s.grade 
    from 
        emp e
    join
        salgrade s
    on
        e.sal between s.losal and s.hisal) t
group by
    t.grade;
+-------+----------+
| grade | totalEmp |
+-------+----------+
|     1 |        3 |
|     2 |        3 |
|     3 |        2 |
|     4 |        5 |
|     5 |        1 |
+-------+----------+

列出所有員工及領導的名字

select
    e.ename, b.ename as leadername
from
    emp e
left join 
    emp b
on 
    e.mgr = b.empno;

— （不用left連線的話，最高階的員工不會顯示。用了left後，最高階的員工會顯示，其 — — leader為null。外連線查詢的條數永遠>=內連線）

列出受僱日期早於其直接上級的所有員工編號、姓名、部門名稱

思路一：第一步將****emp a看成員工表，將emp b 看成領導表，員工表的mgr欄位應該等於領導表的主鍵欄位

select 
    e.empno,
    e.ename
from
    emp e
join
    emp b
on
    e.mgr = b.empno
where
    e.hiredate < b.hiredate;
+-------+-------+
| empno | ename |
+-------+-------+
|  7369 | SMITH |
|  7499 | ALLEN |
|  7521 | WARD  |
|  7566 | JONES |
|  7698 | BLAKE |
|  7782 | CLARK |
|  7876 | ADAMS |
+-------+-------+

第二步：顯示上面員工的部門名稱，將****emp a員工表和dept d進行關聯

select 
    d.dname,
    e.empno,
    e.ename
from
    emp e
join
    emp b
on
    e.mgr = b.empno
join
    dept d
on
    e.deptno = d.deptno
where
    e.hiredate < b.hiredate;
+------------+-------+-------+
| dname      | empno | ename |
+------------+-------+-------+
| ACCOUNTING |  7782 | CLARK |
| RESEARCH   |  7369 | SMITH |
| RESEARCH   |  7566 | JONES |
| RESEARCH   |  7876 | ADAMS |
| SALES      |  7499 | ALLEN |
| SALES      |  7521 | WARD  |
| SALES      |  7698 | BLAKE |
+------------+-------+-------+

列出部門名稱和這些部門的員工資訊，同時列出那些沒有員工的部門

select 
    d.dname,(部門名稱)
    e.*(該部門的員工資訊)
from
    emp e
right join
    dept d
on
    e.deptno = d.deptno;
(需要讓所有的部門顯示出來，因此需要用右外連線)
內連線和外連線分別省略了inner和outer關鍵字

列出至少有5個員工的所有部門

第一步：先求出每個部門有多少員工，將****emp a和部門表 dept d表進行關聯，條件是e.deptno=d.deptno

第二步：然後透過分組e.deptno,過來count(e.ename) >= 5

select 
    e.deptno,count(e.ename) as totalEmp
from
    emp e
group by
    e.deptno
having
    totalEmp >= 5;
+--------+----------+
| deptno | totalEmp |
+--------+----------+
|     20 |        5 |
|     30 |        6 |
+--------+----------+
2 rows in set

這裡比較關鍵：第一點使用了group by 欄位，select 後面的欄位只能是group by後面的欄位e.deptno和聚合函式對應的欄位count(e.ename) as totalEmp

第二點：現在要對聚合函式的結果進行過濾，totalEmp欄位不是資料庫中的欄位，不能使用where進行限制，只能使用having。

（子查詢）列出薪水比“SMITH”多的所有員工資訊

第一步：首先求出是，smith的工資

第二步：然後求出工資高於simith的

select * from emp where sal > (select sal from emp where ename = 'SMITH');

列出所有”CLERK”（辦事員）的姓名及其部門名稱，部門人數

答案

select t1.deptno, t1.dname, t1.ename, t2.totalEmp
from (
      select d.deptno, d.dname, e.ename
      from emp e
      join dept d
      on e.deptno = d.deptno
      where e.job = 'CLERK'
     )t1
join (
      select e.deptno, count(e.ename) as totalEmp
      from emp e
      group by e.deptno
     )t2
on t1.deptno = t2.deptno;

1、第一步在emp a表中查詢出那些人的job崗位是辦事員

2、將emp a表和dept d表相關聯就可以得到職位是辦事員的emp對應的部門名稱

3、查詢出每個部門對應的員工總數

4、將第三步的查詢結果作為一個臨時表t與第二步的查詢結果進行關聯，關聯條件是t.deptno = d.deptno

第一步先找出這一幫人

select d.deptno, d.dname, e.ename
from emp e
join dept d
on e.deptno = d.deptno
where e.job = 'CLERK';

第二步求出每個部門的員工數量

select e.deptno, count(e.ename) as totalEmp
from emp e
group by e.deptno;
最後彙總，把t1表換成第一個sql，t2換成第二個sql：
select t1.deptno, t1.dname, t1.ename, t2.totalEmp
from t1
join t2
on t1.deptno = t2.deptno;

（子查詢）列出最低薪水大於1500的各種工作及從事此工作的全部僱員人數

第一步：先求出每個工作崗位的最低薪水，把>1500的留下

select e.job, min(e.sal) as minsal
from emp e
group by e.job
having minsal > 1500;

第二步：新增count聚合函式，來檢視人數

select e.job, min(e.sal) as minsal, count(e.ename)as totalEmp
from emp e
group by e.job
having minsal > 1500

（子查詢）列出在部門“SALES”<銷售部>工作的員工的姓名，假定不知道銷售部門的部門編號

答案

select ename from emp 
where deptno = (
    select deptno from dept where dname = 'SALES'
);

第一步：先求出部門的部門編號；

select deptno from dept where dname = 'SALES';
+--------+
| deptno |
+--------+
|     30 |
+--------+

第二步：再從部門select部門中的員工姓名；

select ename from emp where deptno = (select deptno from dept where dname = 'SALES');
+--------+
| ename  |
+--------+
| ALLEN  |
| WARD   |
| MARTIN |
| BLAKE  |
| TURNER |
| JAMES  |

（經典）列出薪金高於公司平均薪金的所有員工，所在部門、上級領導、僱員的工資等級

答案

select e.ename,d.dname, b.ename as leadername, s.grade
from emp e
join dept d
on e.deptno = d.deptnp
left join emp b
on e.mgr = b.empno       --員工的領導編號 等於 領導的員工編號
join salgrade s
on e.sal between s.losal and s.hisal
where e.sal >(select avg(sal) as avgsal from emp);

第一步：求出薪金高於公司平均薪金的所有員工

第二步：把第一步的結果當成臨時表t 將臨時表t和部門表 dept d 和工資等級表salary s進行關聯，求出員工所在的部門，僱員的工資等級等

關聯的條件是t.deptno = d.deptno t.salary betweent s.lower and high;

第三步：求出第一步條件下的所有的上級領導，因為有的員工沒有上級領導需要使用left join 左連線

第一步：求出公司的平均薪水
select avg(sal) as avgsal from emp;
+-------------+
| avgsal      |
+-------------+
| 2073.214286 |
+-------------+
第二步：列出薪水高於平均薪水的所有員工
select e.ename
from emp e
where e.sal >(select avg(sal) as avgsal from emp);
第三步：列出所有員工的所在部門(需要join on)
select e.ename,d.dname
from emp e
join dept d
on e.deptno = d.deptnp
where e.sal >(select avg(sal) as avgsal from emp);
第四步：列出所有員工的上級領導(需要join on)
select e.ename,d.dname, b.ename as leadername
from emp e
join dept d
on e.deptno = d.deptnp
join emp b
on e.mgr = b.emp       --員工的領導編號 等於 領導的員工編號
where e.sal >(select avg(sal) as avgsal from emp);
第五步：要求列出所有員工，在第二個join，員工表是emp e表，否則會只顯示有領導的員工
select e.ename,d.dname, b.ename as leadername
from emp e
join dept d
on e.deptno = d.deptnp
left join emp b
on e.mgr = b.empno       --員工的領導編號 等於 領導的員工編號
where e.sal >(select avg(sal) as avgsal from emp);
第六步：僱員的工資等級
select e.ename,d.dname, b.ename as leadername, s.grade
from emp e
join dept d
on e.deptno = d.deptnp
left join emp b
on e.mgr = b.empno       --員工的領導編號 等於 領導的員工編號
join salgrade s
on e.sal between s.losal and s.hisal
where e.sal >(select avg(sal) as avgsal from emp);

列出與“SCOTT”從事相同工作的所有員工及部門名稱

step1：查詢出SCOTT的工作崗位

select job from emp where ename = 'SCOTT';
+---------+
| job     |
+---------+
| ANALYST |
+---------+

step2：部門名稱(需要join部門表)

select 
    d.dname,
    e.*
from
    emp e
join
    dept d
on
    e.deptno = d.deptno
where
    e.job = (select job from emp where ename = 'SCOTT');

列出薪金中等於第30號部門中員工的薪金的其它員工的姓名和薪金

答案

select ename, sal from emp
where sal in
(select distinct sal
from emp
where deptno = 30)
and
deptno <> 30;

第一步：先知道第30號部門中員工的薪金有哪幾種值

select distinct sal
from emp
where deptno = 30;

第二步：顯示姓名和薪水

select ename, sal from emp
where sal in
(
    select distinct sal from emp where deptno = 30
);

第三步：需要滿足”其他員工”的條件

select ename, sal from emp
where sal in
(
    select distinct sal from emp where deptno = 30
)
and
deptno <> 30;

列出薪金高於在第30號部門中工作的所有員工的薪金的員工姓名和薪金、部門名稱

第一步：找出部門30中的最高薪水

select max(sal) as maxsal
from emp
where deptno = 30;

第二步：要輸出的是員工姓名，因需要把emp表作為主表

select d.dname, e.ename, e.sal
from emp e
join dept d
on e.deptno = d.deptno
where e.sal > (select max(sal) as maxsal
from emp;

（關鍵）列出在每個部門工作的員工數量、平均工資和平均服務期限

答案

select d.deptno, count(e.ename),
      ifnull(avg(e.sal),0) as avgsal,
      avg(ifnull((to_days(now())-to_days(hiredate))/365,0)) as serverTime
from emp e
right join dept d
on e.deptno = d.deptno
group by d.deptno;

第一步：求出每個部門對應的所有員工，這裡使用了右連線，保證顯示所有的部門，但是有的部門不存在員工，但是也必須把所有的部門顯示出來

-- 將員工表emp e和部門表dept d進行表連線，將員工表和部門表資訊全部展示
select e.*, d.*
from emp e
right join dept d
on e.deptno = d.deptno;

第二步：在第一步的基礎上求出所有員工的數量，這裡因為有的部門員工是null，所有不能使用count(*)，count(*)統計包含null，應該使用count（e.ename）

-- 列出每個部門工作的員工數量
select d.deptno, count(e.ename)
from emp e
right join dept d
on e.deptno = d.deptno
group by d.deptno;

第三步：求出員工的平均工資，因為有的部門員工不存在，所以對應的工作也是null，這裡需要null值做處理

處理：
IFNULL(expr1, expr2)，如果expr1不是Null，IFNULL()返回expr1，否則返回expr2。
-- 列出每個部門工作的員工數量
select d.deptno, count(e.ename),
      ifnull(avg(e.sal),0) as avgsal
from emp e
right join dept d
on e.deptno = d.deptno
group by d.deptno;

第四步：求出每個員工的平均服務期限：平均服務期限，每個人從入職到今天，一共服務了多少年。相加除以部門人數。

處理：
IFNULL(expr1, expr2)，如果expr1不是Null，IFNULL()返回expr1，否則返回expr2。
-- to_days(日期類型) -> 天數
-- 獲取資料庫的系統當前時間的函式
select to_days(now());
-- 算出員工工作多少天
select ename, (to_days(now())-to_days(hiredate))/365 as serveryear
from emp;
--算出員工工作多少年
select avg((to_days(now())-to_days(hiredate))/365)as serveryear from emp
最終：
select d.deptno, count(e.ename),
      ifnull(avg(e.sal),0) as avgsal,
      avg(ifnull((to_days(now())-to_days(hiredate))/365,0)) as serverTime
from emp e
right join dept d
on e.deptno = d.deptno
group by d.deptno
注意：
count(*) 計算行的數目，包含 NULL
count(column) 特定的列的非空值的行數，不包含 NULL 值。

列出所有員工姓名、部門名稱和工資

-- 注意是所有員工
select d.dname,e.ename,e.sal
from emp e
right join dept d
on e.deptno = d.deptno

列出所有部門的詳細資訊和人數

統計人數的時候不能使用count(*),而要使用count(e.ename)欄位的值，同時
select 
    d.deptno,d.dname,d.loc,count(e.ename) as totalEmp
from
    emp e
right join
    dept d
on
    e.deptno = d.deptno
group by
    d.deptno,d.dname,d.loc;
+--------+------------+----------+----------+
| deptno | dname      | loc      | totalEmp |
+--------+------------+----------+----------+
|     10 | ACCOUNTING | NEW YORK |        3 |
|     20 | RESEARCH   | DALLAS   |        5 |
|     30 | SALES      | CHICAGO  |        6 |
|     40 | OPERATIONS | BOSTON   |        0 |

列出各種工作的最低工資及從事此工作的僱員姓名

第一步：求出各種工作的最低工資

select 
    e.job,min(e.sal) as minsal
from
    emp e
group by 
    e.job;
+-----------+---------+
| job       | minsal  |
+-----------+---------+
| ANALYST   | 3000.00 |
| CLERK     |  800.00 |
| MANAGER   | 2450.00 |
| PRESIDENT | 5000.00 |
| SALESMAN  | 1250.00 |
+-----------+---------+

第二步將以上查詢結果當成臨時表t(job,minsal)

select 
    e.ename
from
    emp e
join
    (select 
        e.job,min(e.sal) as minsal
    from
        emp e
    group by 
        e.job) t
on
    e.job = t.job
where
    e.sal = t.minsal;
+--------+
| ename  |
+--------+
| SMITH  |
| WARD   |
| MARTIN |
| CLARK  |
| SCOTT  |
| KING   |
| FORD   |

列出各個部門Manager的最低薪金

各個部門，需要進行分組；

select e.deptno, min(e.sal) as minsal
from emp e
where e.job = 'Manager'
group by e.deptno;

列出所有員工的年薪，按年薪從低到高進行排序

薪水為年薪+補助，給補助加上空值處理函式。

select ename, (sal + ifnull(comm, 0))*12 as yearsal from emp 
order by yearsal asc;

求出員工領導的薪水超過3000的員工名和領導名

先求出員工所對應的領導，最後再把員工領導的薪水超過3000的選出。

員工表連線領導表，員工的領導編號等於領導的員工編號

select e.ename, b.ename as leadername
from emp e
join emp b
on e.mgr = b.empno
where b.sal > 3000;

求部門名稱中帶’s’字元的部門員工的工資合計、部門人數

先求出部門中帶s的有哪些部門；

select d.dname, sum(e.sal) as sumsal, count(e.ename) as totalEmp
from emp e
join dept d
on e.deptno = d.deptno
where d.dname like '%s%'
group by d.dname;

給任職日期超過30年的員工加薪10%

修改需要用到update語句，

create table emp_bak1 as select * from emp;

update emp_bak1 set sal = sal * 1.1 
where (to_days(now())-to_days(hiredate))/365 >30;

學生表

有3個表S(學生表)，C（課程表），SC（學生選課表）：
S（SNO，SNAME）代表（學號，姓名）
C（CNO，CNAME，CTEACHER）代表（課號，課名，教師）
SC（SNO，CNO，SCGRADE）代表（學號，課號，成績）

找出沒選過”黎明”老師的所有學生姓名。

第一種做法：子查詢
黎明老師的授課的編號 →先找出選過黎明老師的學生編號 → 在學生表中找出

一、找出黎明老師的授課的編號
select cno from c where cteacher = '黎明';
二、再找出選過黎明老師的學生編號
select sno from sc where cno in (select cno from c where cteacher = '黎明');
三、集合
select * from s where sno not in(select sno from sc where cno = (select cno from c where cteacher = '黎明'));

第二種做法 — 表連線做法：

第一步：找到黎明老師所上課對應的課程對應的課程編號
select cno from c where  cteacher = '黎明';
第二步：求出那些學生選修了黎明老師的課程
select sno from sc join(
    select cno from c where  cteacher = '黎明'
)t on sc.cno = t.cno;
第三步：求出那些學生沒有選擇黎明老師的課
select sno,sname from s where sno not in(select sno from sc join( select cno from c where  cteacher = '黎明') t
on sc.cno = t.cno);

列出2門以上（含2門）不及格學生姓名及平均成績

思路一：在sc表中首先按照學生編號進行分組，得到哪些學生的有兩門以上的成績低於60分


第一步：先查詢學生不及格的門數
select 
     sc.sno ,count(*) as studentNum
from 
     sc 
where 
      scgrade < 60
group by
      sc.sno
having
      studentNum >= 2;
（現在只得到了學生編號，需要在s表中找到學生姓名）
第二步：查詢出該學生對應的編號
select 
    a.sno , a.sname
from
    s as a
join
   (
      select 
            sc.sno ,count(*) as studentNum
        from 
             sc 
        where 
              scgrade < 60
        group by
              sc.sno
        having
              studentNum >= 2
     
   ) as b
on
   a.sno = b.sno;   
+-----+----------+
| sno | sname    |
+-----+----------+
|   1 | zhangsan |
+-----+----------+
1 row in se

第三步得到該學生的平均成績，把上面的表當成臨時表m

select 
   m.sno,m.sname,avg(d.scgrade)
from
   sc as d
join
   (
        select 
            a.sno , a.sname
        from
            s as a
        join
           (
              select 
                    sc.sno ,count(*) as studentNum
                from 
                     sc 
                where 
                      scgrade < 60
                group by
                      sc.sno
                having
                      studentNum >= 2
             
           ) as b
        on
           a.sno = b.sno
   
   )  as m
on
  m.sno = d.sno   
group by
   d.sno ;

簡單寫法：

select t1.snmae, t2.avgscgrade
from t1
join t2
on t1.sno=t2.sno;

既學過1號課程又學過2號課所有學生的姓名

select s.sname from 
    sc 
join
    s
on
    sc.sno = s.sno
where 
    cno = 1 and sc.sno in(select sno from sc where cno = 2);

(姓名不在sc表中，因此需要用到join)

不能寫成下面的形式會存在錯誤

select sno from sc where cno=1 and cno =2;

分段使用者數

給你兩個表，表A為：uid, age；表B：uid、package_name、dtm。表B有100億條，需求：每10歲為一年齡段，要每個年齡段的活躍使用者數、使用應用數、使用應用的總次數

select 
 count(distinct B.uid) as 活躍用戶數,
 count(distincct B.package_name) as 使用應用數,
 count(B.dtm) as 使用應用的總次數
from B
join (
 select A.uid,
     case when age <= 10 and age > 10 then '0-10'
     when age <= 20 and age > 10 then '10-20'
     when age > 20 and age <= 30 then '20-30'
     when age > 30 and age <= 40 then '30-40'
     else '40+' END as age_stage 
From A) as C
on C.uid = B.uid
group by age_stage;

時間戳考察

把時間得int資料轉化為時間戳

20210902轉化為2021–09–02

from_unixtime(unix_timestamp(cast(20210902 as string),'yyyyMMdd'),'yyyy-MM-dd')

算時間差

where DATEDIFF('2021-05-16',date_key)<=7 and DATEDIFF('2021-05-16',date_key)>0

以上就是本期的內容分享~~

文章來源：Datawhale ，作者王大鹏

文章連結：https://mp.weixin.qq.com/s/h2M_K4lDAIuBZRQdLrlFhw

※※※※※※※※※※※※※※※※※※※※※※※※※※※※※※※※※

我是「數據分析那些事」。常年分享數據分析乾貨，不定期分享好用的職場技能工具。各位也可以關注我的Facebook，按讚我的臉書並私訊「10」，送你十週入門數據分析電子書唷！期待你與我互動起來~

文章推薦

◆常用的幾個經典Python模組

◆BI和報表的區別，終於有人說清楚了！

◆MySQL常用指令碼

◆商業分析應該怎麼做？一篇文章把思維和工具說清楚了！

◆會員流入流出視覺化的最佳選擇，桑基圖！

回顧十週入門數據分析系列文：

學習計劃｜帶你10周入門資料分析

「我是文科生出身，可以學習資料分析嗎？」「我沒有編程基礎，可以成為資料分析師嗎？」「學習資料分析必須學習R和Python嗎？」 … … 其實，資料分析沒有想像中那麼難，入門也沒有那麼多條條框框。…

medium.com

關注數據君的臉書：

我是「數據分析那些事」。常年分享數據分析乾貨，不定期分享好用的職場技能工具。按贊我的臉書，並在臉書置頂帖子下回復SQL50,會有MySQL經典50題及答案贈送唷！