# 161 视频

### SQL156 各个视频的平均完播率

#### 描述

用户-视频互动表tb\_user\_video\_log

| id | uid | video\_id | start\_time         | end\_time           | if\_follow | if\_like | if\_retweet | comment\_id |
| -- | --- | --------- | ------------------- | ------------------- | ---------- | -------- | ----------- | ----------- |
| 1  | 101 | 2001      | 2021-10-01 10:00:00 | 2021-10-01 10:00:30 | 0          | 1        | 1           | NULL        |
| 2  | 102 | 2001      | 2021-10-01 10:00:00 | 2021-10-01 10:00:24 | 0          | 0        | 1           | NULL        |
| 3  | 103 | 2001      | 2021-10-01 11:00:00 | 2021-10-01 11:00:34 | 0          | 1        | 0           | 1732526     |
| 4  | 101 | 2002      | 2021-09-01 10:00:00 | 2021-9-01 10:00:42  | 1          | 0        | 1           | NULL        |
| 5  | 102 | 2002      | 2021-10-01 11:00:00 | 2021-10-01 10:00:30 | 1          | 0        | 1           | NULL        |

（uid-用户ID, video\_id-视频ID, start\_time-开始观看时间, end\_time-结束观看时间, if\_follow-是否关注, if\_like-是否点赞, if\_retweet-是否转发, comment\_id-评论ID）

短视频信息表tb\_video\_info

| id | video\_id | author | tag | duration | release\_time       |
| -- | --------- | ------ | --- | -------- | ------------------- |
| 1  | 2001      | 901    | 影视  | 30       | 2021-01-01 07:00:00 |
| 2  | 2002      | 901    | 美食  | 60       | 2021-01-01 07:00:00 |
| 3  | 2003      | 902    | 旅游  | 90       | 2021-01-01 07:00:00 |

（video\_id-视频ID, author-创作者ID, tag-类别标签, duration-视频时长（秒）, release\_time-发布时间）

**问题**：计算2021年里有播放记录的每个视频的完播率(结果保留三位小数)，并按完播率降序排序

**注**：视频完播率是指完成播放次数占总播放次数的比例。简单起见，结束观看时间与开始播放时间的差>=视频时长时，视为完成播放。

**输出示例**：

示例数据的结果如下：

| video\_id | avg\_comp\_play\_rate |
| --------- | --------------------- |
| 2001      | 0.667                 |
| 2002      | 0.000                 |

解释：

视频2001在2021年10月有3次播放记录，观看时长分别为30秒、24秒、34秒，视频时长30秒，因此有两次是被认为完成播放了的，故完播率为0.667；

视频2002在2021年9月和10月共2次播放记录，观看时长分别为42秒、30秒，视频时长60秒，故完播率为0.000。

#### 示例：

```sql
DROP TABLE IF EXISTS tb_user_video_log, tb_video_info;
CREATE TABLE tb_user_video_log (
    id INT PRIMARY KEY AUTO_INCREMENT COMMENT '自增ID',
    uid INT NOT NULL COMMENT '用户ID',
    video_id INT NOT NULL COMMENT '视频ID',
    start_time datetime COMMENT '开始观看时间',
    end_time datetime COMMENT '结束观看时间',
    if_follow TINYINT COMMENT '是否关注',
    if_like TINYINT COMMENT '是否点赞',
    if_retweet TINYINT COMMENT '是否转发',
    comment_id INT COMMENT '评论ID'
) CHARACTER SET utf8 COLLATE utf8_bin;

CREATE TABLE tb_video_info (
    id INT PRIMARY KEY AUTO_INCREMENT COMMENT '自增ID',
    video_id INT UNIQUE NOT NULL COMMENT '视频ID',
    author INT NOT NULL COMMENT '创作者ID',
    tag VARCHAR(16) NOT NULL COMMENT '类别标签',
    duration INT NOT NULL COMMENT '视频时长(秒数)',
    release_time datetime NOT NULL COMMENT '发布时间'
)CHARACTER SET utf8 COLLATE utf8_bin;

INSERT INTO tb_user_video_log(uid, video_id, start_time, end_time, if_follow, if_like, if_retweet, comment_id) VALUES
  (101, 2001, '2021-10-01 10:00:00', '2021-10-01 10:00:30', 0, 1, 1, null),
  (102, 2001, '2021-10-01 10:00:00', '2021-10-01 10:00:24', 0, 0, 1, null),
  (103, 2001, '2021-10-01 11:00:00', '2021-10-01 11:00:34', 0, 1, 0, 1732526),
  (101, 2002, '2021-09-01 10:00:00', '2021-09-01 10:00:42', 1, 0, 1, null),
  (102, 2002, '2021-10-01 11:00:00', '2021-10-01 11:00:30', 1, 0, 1, null);

INSERT INTO tb_video_info(video_id, author, tag, duration, release_time) VALUES
  (2001, 901, '影视', 30, '2021-01-01 7:00:00'),
  (2002, 901, '美食', 60, '2021-01-01 7:00:00'),
  (2003, 902, '旅游', 90, '2021-01-01 7:00:00');
```

#### 解答

1.完播率：sum(if(end\_time - start\_time >= duration, 1, 0))/总数

if（x，1，0）满足x为1，否则为0

2.降序排序： desc

3.三位小数：round（x，3）

4.两个表格合二为一，用连接

```sql
select
	l.video_id ,
	   round(sum(if(end_time - start_time >= duration, 1, 0))/ count(start_time), 3) as avg_comp_play_rate
from
	tb_user_video_log l
left join tb_video_info i
on
	l.video_id = i.video_id
where
	year(start_time) = 2021
group by
	l.video_id
order by
	avg_comp_play_rate desc;
```

### SQL157 平均播放进度大于60%的视频类别

#### 描述

用户-视频互动表tb\_user\_video\_log

| id | uid | video\_id | start\_time         | end\_time           | if\_follow | if\_like | if\_retweet | comment\_id |
| -- | --- | --------- | ------------------- | ------------------- | ---------- | -------- | ----------- | ----------- |
| 1  | 101 | 2001      | 2021-10-01 10:00:00 | 2021-10-01 10:00:30 | 0          | 1        | 1           | NULL        |
| 2  | 102 | 2001      | 2021-10-01 10:00:00 | 2021-10-01 10:00:21 | 0          | 0        | 1           | NULL        |
| 3  | 103 | 2001      | 2021-10-01 11:00:50 | 2021-10-01 11:01:20 | 0          | 1        | 0           | 1732526     |
| 4  | 102 | 2002      | 2021-10-01 11:00:00 | 2021-10-01 11:00:30 | 1          | 0        | 1           | NULL        |
| 5  | 103 | 2002      | 2021-10-01 10:59:05 | 2021-10-01 11:00:05 | 1          | 0        | 1           | NULL        |

（uid-用户ID, video\_id-视频ID, start\_time-开始观看时间, end\_time-结束观看时间, if\_follow-是否关注, if\_like-是否点赞, if\_retweet-是否转发, comment\_id-评论ID）

短视频信息表tb\_video\_info

| id | video\_id | author | tag | duration | release\_time       |
| -- | --------- | ------ | --- | -------- | ------------------- |
| 1  | 2001      | 901    | 影视  | 30       | 2021-01-01 07:00:00 |
| 2  | 2002      | 901    | 美食  | 60       | 2021-01-01 07:00:00 |
| 3  | 2003      | 902    | 旅游  | 90       | 2021-01-01 07:00:00 |

（video\_id-视频ID, author-创作者ID, tag-类别标签, duration-视频时长, release\_time-发布时间）

**问题**：计算各类视频的平均播放进度，将进度大于60%的类别输出。

**注**：

* 播放进度=播放时长÷视频时长\*100%，当播放时长大于视频时长时，播放进度均记为100%。
* 结果保留两位小数，并按播放进度倒序排序。

**输出示例**：

示例数据的输出结果如下：

| tag | avg\_play\_progress |
| --- | ------------------- |
| 影视  | 90.00%              |
| 美食  | 75.00%              |

解释：

影视类视频2001被用户101、102、103看过，播放进度分别为：30秒（100%）、21秒（70%）、30秒（100%），平均播放进度为90.00%（保留两位小数）；

美食类视频2002被用户102、103看过，播放进度分别为：30秒（50%）、60秒（100%），平均播放进度为75.00%（保留两位小数）；

#### 示例1

输入：

```
DROP TABLE IF EXISTS tb_user_video_log, tb_video_info;
CREATE TABLE tb_user_video_log (
    id INT PRIMARY KEY AUTO_INCREMENT COMMENT '自增ID',
    uid INT NOT NULL COMMENT '用户ID',
    video_id INT NOT NULL COMMENT '视频ID',
    start_time datetime COMMENT '开始观看时间',
    end_time datetime COMMENT '结束观看时间',
    if_follow TINYINT COMMENT '是否关注',
    if_like TINYINT COMMENT '是否点赞',
    if_retweet TINYINT COMMENT '是否转发',
    comment_id INT COMMENT '评论ID'
) CHARACTER SET utf8 COLLATE utf8_bin;

CREATE TABLE tb_video_info (
    id INT PRIMARY KEY AUTO_INCREMENT COMMENT '自增ID',
    video_id INT UNIQUE NOT NULL COMMENT '视频ID',
    author INT NOT NULL COMMENT '创作者ID',
    tag VARCHAR(16) NOT NULL COMMENT '类别标签',
    duration INT NOT NULL COMMENT '视频时长(秒数)',
    release_time datetime NOT NULL COMMENT '发布时间'
)CHARACTER SET utf8 COLLATE utf8_bin;

INSERT INTO tb_user_video_log(uid, video_id, start_time, end_time, if_follow, if_like, if_retweet, comment_id) VALUES
  (101, 2001, '2021-10-01 10:00:00', '2021-10-01 10:00:30', 0, 1, 1, null),
  (102, 2001, '2021-10-01 10:00:00', '2021-10-01 10:00:21', 0, 0, 1, null),
  (103, 2001, '2021-10-01 11:00:50', '2021-10-01 11:01:20', 0, 1, 0, 1732526),
  (102, 2002, '2021-10-01 11:00:00', '2021-10-01 11:00:30', 1, 0, 1, null),
  (103, 2002, '2021-10-01 10:59:05', '2021-10-01 11:00:05', 1, 0, 1, null);

INSERT INTO tb_video_info(video_id, author, tag, duration, release_time) VALUES
  (2001, 901, '影视', 30, '2021-01-01 7:00:00'),
  (2002, 901, '美食', 60, '2021-01-01 7:00:00'),
  (2003, 902, '旅游', 90, '2020-01-01 7:00:00');
```

输出：

```
影视|90.00%
美食|75.00%
```

#### 解答

1.各类视频的平均播放进度：TIMESTAMPDIFF(second, start\_time, end\_time) / duration

2.进度百分比: round(avg(x)\*100,2)

3.输出结果有%号：concat(x,"%")

4.两个表连接

5.根据tag分组，根据>60%筛选，降序排序

```sql
select
		tag,
		CONCAT(ROUND(AVG(if(TIMESTAMPDIFF(second, start_time, end_time) > duration, 1, TIMESTAMPDIFF(second, start_time, end_time) / duration)) * 100, 2), "%") as avg_play_progress
from
		tb_user_video_log vl
join tb_video_info vi
on 
	vl.video_id = vi.video_id
group by
		tag
having
		avg_play_progress > 60
order by
		avg_play_progress desc;
```

知识点：

1.CONCAT 函数用于将两个字符串连接为一个字符串，试一下下面这个例子：

```
SQL> SELECT CONCAT('FIRST ', 'SECOND');
    +----------------------------+
    | CONCAT('FIRST ', 'SECOND') |
    +----------------------------+
    | FIRST SECOND               |
    +----------------------------+
    1 row in set (0.00 sec)
```

2.TIMESTAMPDIFF函数

TIMESTAMPDIFF(unit,begin,end);

`TIMESTAMPDIFF`函数返回`begin-end`的结果，其中`begin`和`end`是[DATE](http://www.yiibai.com/mysql/date.html)或[DATETIME](http://www.yiibai.com/mysql/datetime.html)表达式。

`TIMESTAMPDIFF`函数允许其参数具有混合类型，例如，`begin`是`DATE`值，`end`可以是`DATETIME`值。 如果使用`DATE`值，则`TIMESTAMPDIFF`函数将其视为时间部分为`“00:00:00”`的`DATETIME`值。

`unit`参数是确定(`end-begin`)的结果的单位，表示为整数。 以下是有效单位：

* MICROSECOND
* SECOND
* MINUTE
* HOUR
* DAY
* WEEK
* MONTH
* QUARTER
* YEAR

### SQL158 每类视频近一个月的转发量/率

#### 描述

用户-视频互动表tb\_user\_video\_log

| id | uid | video\_id | start\_time         | end\_time           | if\_follow | if\_like | if\_retweet | comment\_id |
| -- | --- | --------- | ------------------- | ------------------- | ---------- | -------- | ----------- | ----------- |
| 1  | 101 | 2001      | 2021-10-01 10:00:00 | 2021-10-01 10:00:20 | 0          | 1        | 1           | NULL        |
| 2  | 102 | 2001      | 2021-10-01 10:00:00 | 2021-10-01 10:00:15 | 0          | 0        | 1           | NULL        |
| 3  | 103 | 2001      | 2021-10-01 11:00:50 | 2021-10-01 11:01:15 | 0          | 1        | 0           | 1732526     |
| 4  | 102 | 2002      | 2021-09-10 11:00:00 | 2021-09-10 11:00:30 | 1          | 0        | 1           | NULL        |
| 5  | 103 | 2002      | 2021-10-01 10:59:05 | 2021-10-01 11:00:05 | 1          | 0        | 0           | NULL        |

（uid-用户ID, video\_id-视频ID, start\_time-开始观看时间, end\_time-结束观看时间, if\_follow-是否关注, if\_like-是否点赞, if\_retweet-是否转发, comment\_id-评论ID）

短视频信息表tb\_video\_info

| id | video\_id | author | tag | duration | release\_time       |
| -- | --------- | ------ | --- | -------- | ------------------- |
| 1  | 2001      | 901    | 影视  | 30       | 2021-01-01 07:00:00 |
| 2  | 2002      | 901    | 美食  | 60       | 2021-01-01 07:00:00 |
| 3  | 2003      | 902    | 旅游  | 90       | 2020-01-01 07:00:00 |

（video\_id-视频ID, author-创作者ID, tag-类别标签, duration-视频时长, release\_time-发布时间）

**问题**：统计在有用户互动的最近一个月（按包含当天在内的近30天算，比如10月31日的近30天为10.2\~10.31之间的数据）中，每类视频的转发量和转发率（保留3位小数）。

**注**：转发率＝转发量÷播放量。结果按转发率降序排序。

**输出示例**：

示例数据的输出结果如下

| tag | retweet\_cut | retweet\_rate |
| --- | ------------ | ------------- |
| 影视  | 2            | 0.667         |
| 美食  | 1            | 0.500         |

解释：

由表tb\_user\_video\_log的数据可得，数据转储当天为2021年10月1日。近30天内，影视类视频2001共有3次播放记录，被转发2次，转发率为0.667；美食类视频2002共有2次播放记录，1次被转发，转发率为0.500。

#### 示例1

输入：

```
DROP TABLE IF EXISTS tb_user_video_log, tb_video_info;
CREATE TABLE tb_user_video_log (
    id INT PRIMARY KEY AUTO_INCREMENT COMMENT '自增ID',
    uid INT NOT NULL COMMENT '用户ID',
    video_id INT NOT NULL COMMENT '视频ID',
    start_time datetime COMMENT '开始观看时间',
    end_time datetime COMMENT '结束观看时间',
    if_follow TINYINT COMMENT '是否关注',
    if_like TINYINT COMMENT '是否点赞',
    if_retweet TINYINT COMMENT '是否转发',
    comment_id INT COMMENT '评论ID'
) CHARACTER SET utf8 COLLATE utf8_bin;

CREATE TABLE tb_video_info (
    id INT PRIMARY KEY AUTO_INCREMENT COMMENT '自增ID',
    video_id INT UNIQUE NOT NULL COMMENT '视频ID',
    author INT NOT NULL COMMENT '创作者ID',
    tag VARCHAR(16) NOT NULL COMMENT '类别标签',
    duration INT NOT NULL COMMENT '视频时长(秒数)',
    release_time datetime NOT NULL COMMENT '发布时间'
)CHARACTER SET utf8 COLLATE utf8_bin;

INSERT INTO tb_user_video_log(uid, video_id, start_time, end_time, if_follow, if_like, if_retweet, comment_id) VALUES
   (101, 2001, '2021-10-01 10:00:00', '2021-10-01 10:00:20', 0, 1, 1, null)
  ,(102, 2001, '2021-10-01 10:00:00', '2021-10-01 10:00:15', 0, 0, 1, null)
  ,(103, 2001, '2021-10-01 11:00:50', '2021-10-01 11:01:15', 0, 1, 0, 1732526)
  ,(102, 2002, '2021-09-10 11:00:00', '2021-09-10 11:00:30', 1, 0, 1, null)
  ,(103, 2002, '2021-10-01 10:59:05', '2021-10-01 11:00:05', 1, 0, 0, null);

INSERT INTO tb_video_info(video_id, author, tag, duration, release_time) VALUES
   (2001, 901, '影视', 30, '2021-01-01 7:00:00')
  ,(2002, 901, '美食', 60, '2021-01-01 7:00:00')
  ,(2003, 902, '旅游', 90, '2020-01-01 7:00:00');
```

复制

输出：

```
影视|2|0.667
美食|1|0.500
```

#### 解答

1.转发率=ROUND(SUM(if\_retweet)/ COUNT(\*), 3)

2.有用户互动的最近一个月：

DATEDIFF(DATE((select max(start\_time) from tb\_user\_video\_log)), DATE(vl.start\_time)) <= 29

```sql
select
	vi.tag,
	SUM(if_retweet) retweet_cnt,
	ROUND(SUM(if_retweet)/ COUNT(*), 3) retweet_rate
from
	tb_user_video_log vl
left join tb_video_info vi
on
	vl.video_id = vi.video_id
where
	DATEDIFF(DATE((select max(start_time) from tb_user_video_log)), DATE(vl.start_time)) <= 29
group by
	vi.tag
order by
	retweet_rate desc 

```

### SQL159 每个创作者每月的涨粉率及截止当前的总粉丝量

#### 描述

用户-视频互动表tb\_user\_video\_log

| id | uid | video\_id | start\_time         | end\_time           | if\_follow | if\_like | if\_retweet | comment\_id |
| -- | --- | --------- | ------------------- | ------------------- | ---------- | -------- | ----------- | ----------- |
| 1  | 101 | 2001      | 2021-09-01 10:00:00 | 2021-09-01 10:00:20 | 0          | 1        | 1           | NULL        |
| 2  | 105 | 2002      | 2021-09-10 11:00:00 | 2021-09-10 11:00:30 | 1          | 0        | 1           | NULL        |
| 3  | 101 | 2001      | 2021-10-01 10:00:00 | 2021-10-01 10:00:20 | 1          | 1        | 1           | NULL        |
| 4  | 102 | 2001      | 2021-10-01 10:00:00 | 2021-10-01 10:00:15 | 0          | 0        | 1           | NULL        |
| 5  | 103 | 2001      | 2021-10-01 11:00:50 | 2021-10-01 11:01:15 | 1          | 1        | 0           | 1732526     |
| 6  | 106 | 2002      | 2021-10-01 10:59:05 | 2021-10-01 11:00:05 | 2          | 0        | 0           | NULL        |

（uid-用户ID, video\_id-视频ID, start\_time-开始观看时间, end\_time-结束观看时间, if\_follow-是否关注, if\_like-是否点赞, if\_retweet-是否转发, comment\_id-评论ID）

短视频信息表tb\_video\_info

| id | video\_id | author | tag | duration | release\_time       |
| -- | --------- | ------ | --- | -------- | ------------------- |
| 1  | 2001      | 901    | 影视  | 30       | 2021-01-01 07:00:00 |
| 2  | 2002      | 901    | 美食  | 60       | 2021-01-01 07:00:00 |
| 3  | 2003      | 902    | 旅游  | 90       | 2020-01-01 07:00:00 |
| 4  | 2004      | 902    | 美女  | 90       | 2020-01-01 08:00:00 |

（video\_id-视频ID, author-创作者ID, tag-类别标签, duration-视频时长, release\_time-发布时间）

**问题**：计算2021年里每个创作者每月的涨粉率及截止当月的总粉丝量

**注**：

* 涨粉率=(加粉量 - 掉粉量) / 播放量。结果按创作者ID、总粉丝量升序排序。
* **if\_follow-是否关注**为1表示用户观看视频中关注了视频创作者，为0表示此次互动前后关注状态未发生变化，为2表示本次观看过程中取消了关注。

**输出示例**：

示例数据的输出结果如下

| author | month   | fans\_growth\_rate | total\_fans |
| ------ | ------- | ------------------ | ----------- |
| 901    | 2021-09 | 0.500              | 1           |
| 901    | 2021-10 | 0.250              | 2           |

解释：

示例数据中表tb\_user\_video\_log里只有视频2001和2002的播放记录，都来自创作者901，播放时间在2021年9月和10月；其中9月里加粉量为1，掉粉量为0，播放量为2，因此涨粉率为0.500（保留3位小数）；其中10月里加粉量为2，掉份量为1，播放量为4，因此涨粉率为0.250，截止当前总粉丝数为2。

#### 示例1

输入：

```sql
DROP TABLE IF EXISTS tb_user_video_log, tb_video_info;
CREATE TABLE tb_user_video_log (
    id INT PRIMARY KEY AUTO_INCREMENT COMMENT '自增ID',
    uid INT NOT NULL COMMENT '用户ID',
    video_id INT NOT NULL COMMENT '视频ID',
    start_time datetime COMMENT '开始观看时间',
    end_time datetime COMMENT '结束观看时间',
    if_follow TINYINT COMMENT '是否关注',
    if_like TINYINT COMMENT '是否点赞',
    if_retweet TINYINT COMMENT '是否转发',
    comment_id INT COMMENT '评论ID'
) CHARACTER SET utf8 COLLATE utf8_bin;

CREATE TABLE tb_video_info (
    id INT PRIMARY KEY AUTO_INCREMENT COMMENT '自增ID',
    video_id INT UNIQUE NOT NULL COMMENT '视频ID',
    author INT NOT NULL COMMENT '创作者ID',
    tag VARCHAR(16) NOT NULL COMMENT '类别标签',
    duration INT NOT NULL COMMENT '视频时长(秒数)',
    release_time datetime NOT NULL COMMENT '发布时间'
)CHARACTER SET utf8 COLLATE utf8_bin;

INSERT INTO tb_user_video_log(uid, video_id, start_time, end_time, if_follow, if_like, if_retweet, comment_id) VALUES
   (101, 2001, '2021-09-01 10:00:00', '2021-09-01 10:00:20', 0, 1, 1, null)
  ,(105, 2002, '2021-09-10 11:00:00', '2021-09-10 11:00:30', 1, 0, 1, null)
  ,(101, 2001, '2021-10-01 10:00:00', '2021-10-01 10:00:20', 1, 1, 1, null)
  ,(102, 2001, '2021-10-01 10:00:00', '2021-10-01 10:00:15', 0, 0, 1, null)
  ,(103, 2001, '2021-10-01 11:00:50', '2021-10-01 11:01:15', 1, 1, 0, 1732526)
  ,(106, 2002, '2021-10-01 10:59:05', '2021-10-01 11:00:05', 2, 0, 0, null);

INSERT INTO tb_video_info(video_id, author, tag, duration, release_time) VALUES
   (2001, 901, '影视', 30, '2021-01-01 7:00:00')
  ,(2002, 901, '影视', 60, '2021-01-01 7:00:00')
  ,(2003, 902, '旅游', 90, '2020-01-01 7:00:00')
  ,(2004, 902, '美女', 90, '2020-01-01 8:00:00');
```

复制

输出：

```
901|2021-09|0.500|1
901|2021-10|0.250|2
```

#### 解答

1.计算两个列的差值可以使用：sum（case..when...when...）

2.round:计算保留小数位数

```
SELECT ROUND(column_name,decimals) FROM TABLE_NAME;
```

| 参数           | 描述             |
| ------------ | -------------- |
| column\_name | 必需。要舍入的字段。     |
| decimals     | 可选。规定要返回的小数位数。 |

3.分区函数Partition By的用法：

```
row_number() over(partition by ... order by ...)
rank() over(partition by ... order by ...)
dense_rank() over(partition by ... order by ...)
count() over(partition by ... order by ...) 求分组后的总数
max() over(partition by ... order by ...) 求分组后的最大值
min() over(partition by ... order by ...) 求分组后的最小值
sum() over(partition by ... order by ...) 求分组后的总和
avg() over(partition by ... order by ...) 求分组后的平均值
first_value() over(partition by ... order by ...) 求分组后的第一个值
last_value() over(partition by ... order by ...) 求分组后的最后一个值
lag() over(partition by ... order by ...) 取出分组后前n行数据
lead() over(partition by ... order by ...) 取出分组后后n行数据

```

```sql
select author,
       date_format(start_time, '%Y-%m') month,
       round(sum(case
           when if_follow = 1 then 1
           when if_follow = 2 then -1
                              else 0 end) / count(author), 3) fans_growth_rate,

       sum(sum(case
           when if_follow = 1 then 1
           when if_follow = 2 then -1
                              else 0 end)) over (partition by author order by date_format(start_time, '%Y-%m')) total_fans
    from tb_user_video_log log
             left join tb_video_info info on log.video_id = info.video_id
    where year(start_time) = 2021
    group by author, month
    order by author, total_fans
```

### **SQL160** **国庆期间每类视频点赞量和转发量**

#### 描述

用户-视频互动表tb\_user\_video\_log

| id | uid | video\_id | start\_time         | end\_time           | if\_follow | if\_like | if\_retweet | comment\_id |
| -- | --- | --------- | ------------------- | ------------------- | ---------- | -------- | ----------- | ----------- |
| 1  | 101 | 2001      | 2021-09-24 10:00:00 | 2021-09-24 10:00:20 | 1          | 1        | 0           | NULL        |
| 2  | 105 | 2002      | 2021-09-25 11:00:00 | 2021-09-25 11:00:30 | 0          | 0        | 1           | NULL        |
| 3  | 102 | 2002      | 2021-09-25 11:00:00 | 2021-09-25 11:00:30 | 1          | 1        | 1           | NULL        |
| 4  | 101 | 2002      | 2021-09-26 11:00:00 | 2021-09-26 11:00:30 | 1          | 0        | 1           | NULL        |
| 5  | 101 | 2002      | 2021-09-27 11:00:00 | 2021-09-27 11:00:30 | 1          | 1        | 0           | NULL        |
| 6  | 102 | 2002      | 2021-09-28 11:00:00 | 2021-09-28 11:00:30 | 1          | 0        | 1           | NULL        |
| 7  | 103 | 2002      | 2021-09-29 11:00:00 | 2021-10-02 11:00:30 | 1          | 0        | 1           | NULL        |
| 8  | 102 | 2002      | 2021-09-30 11:00:00 | 2021-09-30 11:00:30 | 1          | 1        | 1           | NULL        |
| 9  | 101 | 2001      | 2021-10-01 10:00:00 | 2021-10-01 10:00:20 | 1          | 1        | 0           | NULL        |
| 10 | 102 | 2001      | 2021-10-01 10:00:00 | 2021-10-01 10:00:15 | 0          | 0        | 1           | NULL        |
| 11 | 103 | 2001      | 2021-10-01 11:00:50 | 2021-10-01 11:01:15 | 1          | 1        | 0           | 1732526     |
| 12 | 106 | 2002      | 2021-10-02 10:59:05 | 2021-10-02 11:00:05 | 2          | 0        | 1           | NULL        |
| 13 | 107 | 2002      | 2021-10-02 10:59:05 | 2021-10-02 11:00:05 | 1          | 0        | 1           | NULL        |
| 14 | 108 | 2002      | 2021-10-02 10:59:05 | 2021-10-02 11:00:05 | 1          | 1        | 1           | NULL        |
| 15 | 109 | 2002      | 2021-10-03 10:59:05 | 2021-10-03 11:00:05 | 0          | 1        | 0           | NULL        |

（uid-用户ID, video\_id-视频ID, start\_time-开始观看时间, end\_time-结束观看时间, if\_follow-是否关注, if\_like-是否点赞, if\_retweet-是否转发, comment\_id-评论ID）

短视频信息表tb\_video\_info

| id | video\_id | author | tag | duration | release\_time       |
| -- | --------- | ------ | --- | -------- | ------------------- |
| 1  | 2001      | 901    | 旅游  | 30       | 2020-01-01 07:00:00 |
| 2  | 2002      | 901    | 旅游  | 60       | 2021-01-01 07:00:00 |
| 3  | 2003      | 902    | 影视  | 90       | 2020-01-01 07:00:00 |
| 4  | 2004      | 902    | 美女  | 90       | 2020-01-01 08:00:00 |

（video\_id-视频ID, author-创作者ID, tag-类别标签, duration-视频时长, release\_time-发布时间）

#### **问题**：

统计2021年国庆头3天每类视频每天的近一周总点赞量和一周内最大单天转发量，结果按视频类别降序、日期升序排序。假设数据库中数据足够多，至少每个类别下国庆头3天及之前一周的每天都有播放记录。

**输出示例**：

示例数据的输出结果如下

| tag | dt         | sum\_like\_cnt\_7d | max\_retweet\_cnt\_7d |
| --- | ---------- | ------------------ | --------------------- |
| 旅游  | 2021-10-01 | 5                  | 2                     |
| 旅游  | 2021-10-02 | 5                  | 3                     |
| 旅游  | 2021-10-03 | 6                  | 3                     |

解释：

由表tb\_user\_video\_log里的数据可得只有旅游类视频的播放，2021年9月25到10月3日每天的点赞量和转发量如下：

| tag | dt         | like\_cnt | retweet\_cnt |
| --- | ---------- | --------- | ------------ |
| 旅游  | 2021-09-25 | 1         | 2            |
| 旅游  | 2021-09-26 | 0         | 1            |
| 旅游  | 2021-09-27 | 1         | 0            |
| 旅游  | 2021-09-28 | 0         | 1            |
| 旅游  | 2021-09-29 | 0         | 1            |
| 旅游  | 2021-09-30 | 1         | 1            |
| 旅游  | 2021-10-01 | 2         | 1            |
| 旅游  | 2021-10-02 | 1         | 3            |
| 旅游  | 2021-10-03 | 1         | 0            |

因此国庆头3天（10.01~~10.03）里10.01的近7天（9.25~~10.01）总点赞量为5次，单天最大转发量为2次（9月25那天最大）；同理可得10.02和10.03的两个指标。

#### 示例1

输入：

```sql
DROP TABLE IF EXISTS tb_user_video_log, tb_video_info;
CREATE TABLE tb_user_video_log (
    id INT PRIMARY KEY AUTO_INCREMENT COMMENT '自增ID',
    uid INT NOT NULL COMMENT '用户ID',
    video_id INT NOT NULL COMMENT '视频ID',
    start_time datetime COMMENT '开始观看时间',
    end_time datetime COMMENT '结束观看时间',
    if_follow TINYINT COMMENT '是否关注',
    if_like TINYINT COMMENT '是否点赞',
    if_retweet TINYINT COMMENT '是否转发',
    comment_id INT COMMENT '评论ID'
) CHARACTER SET utf8 COLLATE utf8_bin;

CREATE TABLE tb_video_info (
    id INT PRIMARY KEY AUTO_INCREMENT COMMENT '自增ID',
    video_id INT UNIQUE NOT NULL COMMENT '视频ID',
    author INT NOT NULL COMMENT '创作者ID',
    tag VARCHAR(16) NOT NULL COMMENT '类别标签',
    duration INT NOT NULL COMMENT '视频时长(秒数)',
    release_time datetime NOT NULL COMMENT '发布时间'
)CHARACTER SET utf8 COLLATE utf8_bin;

INSERT INTO tb_user_video_log(uid, video_id, start_time, end_time, if_follow, if_like, if_retweet, comment_id) VALUES
   (101, 2001, '2021-09-24 10:00:00', '2021-09-24 10:00:20', 1, 1, 0, null)
  ,(105, 2002, '2021-09-25 11:00:00', '2021-09-25 11:00:30', 0, 0, 1, null)
  ,(102, 2002, '2021-09-25 11:00:00', '2021-09-25 11:00:30', 1, 1, 1, null)
  ,(101, 2002, '2021-09-26 11:00:00', '2021-09-26 11:00:30', 1, 0, 1, null)
  ,(101, 2002, '2021-09-27 11:00:00', '2021-09-27 11:00:30', 1, 1, 0, null)
  ,(102, 2002, '2021-09-28 11:00:00', '2021-09-28 11:00:30', 1, 0, 1, null)
  ,(103, 2002, '2021-09-29 11:00:00', '2021-09-29 11:00:30', 1, 0, 1, null)
  ,(102, 2002, '2021-09-30 11:00:00', '2021-09-30 11:00:30', 1, 1, 1, null)
  ,(101, 2001, '2021-10-01 10:00:00', '2021-10-01 10:00:20', 1, 1, 0, null)
  ,(102, 2001, '2021-10-01 10:00:00', '2021-10-01 10:00:15', 0, 0, 1, null)
  ,(103, 2001, '2021-10-01 11:00:50', '2021-10-01 11:01:15', 1, 1, 0, 1732526)
  ,(106, 2002, '2021-10-02 10:59:05', '2021-10-02 11:00:05', 2, 0, 1, null)
  ,(107, 2002, '2021-10-02 10:59:05', '2021-10-02 11:00:05', 1, 0, 1, null)
  ,(108, 2002, '2021-10-02 10:59:05', '2021-10-02 11:00:05', 1, 1, 1, null)
  ,(109, 2002, '2021-10-03 10:59:05', '2021-10-03 11:00:05', 0, 1, 0, null);

INSERT INTO tb_video_info(video_id, author, tag, duration, release_time) VALUES
   (2001, 901, '旅游', 30, '2020-01-01 7:00:00')
  ,(2002, 901, '旅游', 60, '2021-01-01 7:00:00')
  ,(2003, 902, '影视', 90, '2020-01-01 7:00:00')
  ,(2004, 902, '美女', 90, '2020-01-01 8:00:00');
```

复制

输出：

```
旅游|2021-10-01|5|2
旅游|2021-10-02|5|3
旅游|2021-10-03|6|3
```

#### 解答

1.开窗函数的框架限定算数：

* rows n perceding：从当前行到前n行（一共n+1行）
* rang/rows between 边界规则1 and 边界规则2：rang表示按照值的范围进行定义框架，rows表示按照行的范围进行定义框架

```
rows between 2 perceding and 2 following #当前行往前2行+当前行+当前行往后2行（一共5行）
rows between 1 following 3 following #当前行的后1——>后3（共3行）
rows between unbounded preceding and current row #从第一行到当前行
```

![img](https://raw.githubusercontent.com/qkd90/figureBed/main/202407111710520.png)

```sql
WITH t1 AS (SELECT tag,
                   DATE_FORMAT(start_time, '%Y-%m-%d') dt,
                   SUM(SUM(if_like)) OVER (PARTITION BY tag ORDER BY DATE_FORMAT(start_time, '%Y-%m-%d') rows 6 preceding) AS sum_like_cnt_7d,
                   MAX(SUM(if_retweet)) OVER (PARTITION BY tag ORDER BY DATE_FORMAT(start_time, '%Y-%m-%d') rows 6 preceding) AS max_retweet_cnt_7d
                FROM tb_user_video_log
                         JOIN tb_video_info USING (video_id)
                WHERE DATEDIFF('2021-10-03', DATE_FORMAT(start_time, '%Y-%m-%d')) < 9
                GROUP BY dt, tag)
SELECT *
    FROM t1
    WHERE dt BETWEEN '2021-10-01' AND '2021-10-03'
    ORDER BY tag DESC, dt;
```

### **SQL161** **近一个月发布的视频中热度最高的top3视频**

### 描述

现有用户-视频互动表tb\_user\_video\_log

| id | uid | video\_id | start\_time         | end\_time           | if\_follow | if\_like | if\_retweet | comment\_id |
| -- | --- | --------- | ------------------- | ------------------- | ---------- | -------- | ----------- | ----------- |
| 1  | 101 | 2001      | 2021-09-24 10:00:00 | 2021-09-24 10:00:30 | 1          | 1        | 1           | NULL        |
| 2  | 101 | 2001      | 2021-10-01 10:00:00 | 2021-10-01 10:00:31 | 1          | 1        | 0           | NULL        |
| 3  | 102 | 2001      | 2021-10-01 10:00:00 | 2021-10-01 10:00:35 | 0          | 0        | 1           | NULL        |
| 4  | 103 | 2001      | 2021-10-03 11:00:50 | 2021-10-03 10:00:35 | 1          | 1        | 0           | 1732526     |
| 5  | 106 | 2002      | 2021-10-02 11:00:05 | 2021-10-02 11:01:04 | 2          | 0        | 1           | NULL        |
| 6  | 107 | 2002      | 2021-10-02 10:59:05 | 2021-10-02 11:00:06 | 1          | 0        | 0           | NULL        |
| 7  | 108 | 2002      | 2021-10-02 10:59:05 | 2021-10-02 11:00:05 | 1          | 1        | 1           | NULL        |
| 8  | 109 | 2002      | 2021-10-03 10:59:05 | 2021-10-03 11:00:01 | 0          | 1        | 0           | NULL        |
| 9  | 105 | 2002      | 2021-09-25 11:00:00 | 2021-09-25 11:00:30 | 1          | 0        | 1           | NULL        |
| 10 | 101 | 2003      | 2021-09-26 11:00:00 | 2021-09-26 11:00:30 | 1          | 0        | 0           | NULL        |
| 11 | 101 | 2003      | 2021-09-30 11:00:00 | 2021-09-30 11:00:30 | 1          | 1        | 0           | NULL        |

（uid-用户ID, video\_id-视频ID, start\_time-开始观看时间, end\_time-结束观看时间, if\_follow-是否关注, if\_like-是否点赞, if\_retweet-是否转发, comment\_id-评论ID）

短视频信息表tb\_video\_info

| id | video\_id | author | tag | duration | release\_time       |
| -- | --------- | ------ | --- | -------- | ------------------- |
| 1  | 2001      | 901    | 旅游  | 30       | 2021-09-05 07:00:00 |
| 2  | 2002      | 901    | 旅游  | 60       | 2021-09-05 07:00:00 |
| 3  | 2003      | 902    | 影视  | 90       | 2021-09-05 07:00:00 |
| 4  | 2004      | 902    | 影视  | 90       | 2021-09-05 08:00:00 |

（video\_id-视频ID, author-创作者ID, tag-类别标签, duration-视频时长, release\_time-发布时间）

**问题**：找出近一个月发布的视频中热度最高的top3视频。

**注**：

* 热度=(a*视频完播率+b*点赞数+c*评论数+d*转发数)\*新鲜度；
* 新鲜度=1/(最近无播放天数+1)；
* 当前配置的参数a,b,c,d分别为100、5、3、2。
* 最近播放日期以**end\_time-结束观看时间**为准，假设为T，则最近一个月按\[T-29, T]闭区间统计。
* 结果中热度保留为**整数**，并按热度**降序**排序。

**输出示例**：

示例数据的输出结果如下

| video\_id | hot\_index |
| --------- | ---------- |
| 2001      | 122        |
| 2002      | 56         |
| 2003      | 1          |

解释：

最近播放日期为2021-10-03，记作当天日期；近一个月（2021-09-04及之后）发布的视频有2001、2002、2003、2004，不过2004暂时还没有播放记录；

视频2001完播率1.0（被播放次数4次，完成播放4次），被点赞3次，评论1次，转发2次，最近无播放天数为0，因此热度为：(100*1.0+5*3+3*1+2*2)/(0+1)=122

同理，视频2003完播率0，被点赞数1，评论和转发均为0，最近无播放天数为3，因此热度为：(100*0+5*1+3*0+2*0)/(3+1)=1（1.2保留为整数）。

### 示例1

输入：

```
DROP TABLE IF EXISTS tb_user_video_log, tb_video_info;
CREATE TABLE tb_user_video_log (
    id INT PRIMARY KEY AUTO_INCREMENT COMMENT '自增ID',
    uid INT NOT NULL COMMENT '用户ID',
    video_id INT NOT NULL COMMENT '视频ID',
    start_time datetime COMMENT '开始观看时间',
    end_time datetime COMMENT '结束观看时间',
    if_follow TINYINT COMMENT '是否关注',
    if_like TINYINT COMMENT '是否点赞',
    if_retweet TINYINT COMMENT '是否转发',
    comment_id INT COMMENT '评论ID'
) CHARACTER SET utf8 COLLATE utf8_bin;

CREATE TABLE tb_video_info (
    id INT PRIMARY KEY AUTO_INCREMENT COMMENT '自增ID',
    video_id INT UNIQUE NOT NULL COMMENT '视频ID',
    author INT NOT NULL COMMENT '创作者ID',
    tag VARCHAR(16) NOT NULL COMMENT '类别标签',
    duration INT NOT NULL COMMENT '视频时长(秒数)',
    release_time datetime NOT NULL COMMENT '发布时间'
)CHARACTER SET utf8 COLLATE utf8_bin;

INSERT INTO tb_user_video_log(uid, video_id, start_time, end_time, if_follow, if_like, if_retweet, comment_id) VALUES
   (101, 2001, '2021-09-24 10:00:00', '2021-09-24 10:00:30', 1, 1, 1, null)
  ,(101, 2001, '2021-10-01 10:00:00', '2021-10-01 10:00:31', 1, 1, 0, null)
  ,(102, 2001, '2021-10-01 10:00:00', '2021-10-01 10:00:35', 0, 0, 1, null)
  ,(103, 2001, '2021-10-03 11:00:50', '2021-10-03 11:01:35', 1, 1, 0, 1732526)
  ,(106, 2002, '2021-10-02 10:59:05', '2021-10-02 11:00:04', 2, 0, 1, null)
  ,(107, 2002, '2021-10-02 10:59:05', '2021-10-02 11:00:06', 1, 0, 0, null)
  ,(108, 2002, '2021-10-02 10:59:05', '2021-10-02 11:00:05', 1, 1, 1, null)
  ,(109, 2002, '2021-10-03 10:59:05', '2021-10-03 11:00:01', 0, 1, 0, null)
  ,(105, 2002, '2021-09-25 11:00:00', '2021-09-25 11:00:30', 1, 0, 1, null)
  ,(101, 2003, '2021-09-26 11:00:00', '2021-09-26 11:00:30', 1, 0, 0, null)
  ,(101, 2003, '2021-09-30 11:00:00', '2021-09-30 11:00:30', 1, 1, 0, null);

INSERT INTO tb_video_info(video_id, author, tag, duration, release_time) VALUES
   (2001, 901, '旅游', 30, '2021-09-05 7:00:00')
  ,(2002, 901, '旅游', 60, '2021-09-05 7:00:00')
  ,(2003, 902, '影视', 90, '2021-09-05 7:00:00')
  ,(2004, 902, '影视', 90, '2021-09-05 8:00:00');
```

复制

输出：

```
2001|122
2002|56
2003|1
```

#### 解答

流程：

1. 先查询出每个用户第一次登陆时间（最小登陆时间）--每天新用户表
2. 因为涉及到跨天活跃，所以要进行并集操作，将登录时间和登出时间取并集，这里union会去重--用户活跃表
3. 将每天新用户表和用户活跃表左连接，只有是同一用户并且该用户第2天依旧登陆才会保留整个记录，否则右表记录为空
4. 得到每天新用户第二天是否登陆表后,开始计算每天的次日留存率：根据日期分组计算，次日活跃用户个数/当天新用户个数

```sql
select t1.dt, round(count(t2.uid) / count(t1.uid), 2) uv_rate
    from (select uid
                  ,
                 min(date(in_time)) dt
              from tb_user_log
              group by uid) as t1 -- 每天新用户表
             left join (select uid, date(in_time) dt
                            from tb_user_log
                        union
                        select uid, date(out_time)
                            from tb_user_log) as t2 -- 用户活跃表
                       on t1.uid = t2.uid
                           and t1.dt = date_sub(t2.dt, INTERVAL 1 day)
    where date_format(t1.dt, '%Y-%m') = '2021-11'
    group by t1.dt
    order by t1.dt
```


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://qiangrens-organization.gitbook.io/qkd90/shu-ju-ku-he-zhong-jian-jian/sql-yu-ju/156161-shi-pin.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
