比想象中更复杂一点的MySQL Slow Query Log-牛翰网

1. 问题概述

在分析 Slow Query Log 时，记录下的SQL语句，明明会对一张表执行全表扫描，可为什么慢日志中的 Rows_sent 、Rows_examined 和表的真实记录数也是不一样，甚至相差N多倍。还有一个细节就是上述的SQL语句，执行多次，在慢日志中记录下多条记录，记录之间Rows_sent 、Rows_examined也差别明显。

这是什么原因导致的呢？

2.举例说明

假如，有一张 product_stock的表，其全表的数据量为80201010，size 约为56G。

对全表进行count(*)，在慢日志留下的记录如下

# Time：2019-06-06T13:51:22.111111+08:00
# User@Host hehe[hehe] @ localhost [] Id: 868686
# Query_time : 39.112233 Lock_time: 0.000333 Rows_sent 1 Rows_examined: 80201010 
SET timestamp .....;
select count(*) from product_stock;

但是系统应用触发的慢SQL记录如下

# Time：2019-06-05T14:22:22.111222+08:00
# User@Host uwser[uwser] @ [XX.XX.XX.XX] Id: 667766
# Query_time : 520.662233 Lock_time: 0.000296 Rows_sent 820111 Rows_examined: 820111 
SET timestamp .....;
select * from product_stock where 1=1;

说明： where 1=1 ,是系统框架自动补全的，目的是防止SQL语句没有where 条件，这个是无碍的。

确信整个语句就是全表扫描，问题是为什么它记录下来的扫描行数只是表数据的一小部分？也没有limit限制啊？

3.官方文档对慢日志的定义

The slow query log consists of SQL statements that take more than long_query_time seconds to execute and require at least min_examined_row_limit rows to be examined. The slow query log can be used to find queries that take a long time to execute and are therefore candidates for optimization.

The time to acquire the initial locks is not counted as execution time. mysqld writes a statement to the slow query log after it has been executed and after all locks have been released, so log order might differ from execution order.

• Query_time: duration

The statement execution time in seconds.

• Lock_time: duration

The time to acquire locks in seconds.

• Rows_sent: N

The number of rows sent to the client.

• Rows_examined:

The number of rows examined by the server layer (not counting any processing internal to storage engines).

这些知识对描述的疑惑没有直接帮助。还需我们继续探寻。

4. 猜想

慢日志记录的行数只是整个表的一部分，那会不会是还没执行完？会不会还在执行中被取消了？才导致只是scan其中的部分，返回的行数只是已scan的部分？

例如，如果条件允许的话，整个scan过程需要10分钟，但是执行到1分钟时，因为连接参数设置或则客户端主动取消，才进行了1/10，但是这个SQL语句还是被慢日志记录下来了，虽然它没有执行完整。

5.猜想验证

为了使验证过程简单直接，直接通过本地mysql客户端连接吧。

5.1 执行过程中，直接cancel

当然，cancel的时候，已执行的时间一定要大于自定义的慢查询时间阈值。

截取其中的一条慢日志

# Time：2019-06-06T18:36:18.554477+08:00
# User@Host uwser[uwser] @ [XX.XX.XX.XX] Id: 842366
# Query_time : 20.662233 Lock_time: 0.000296 Rows_sent 3691064 Rows_examined: 3691064 
SET timestamp .....;
select * from product_stock

cancal取消后，仍然会记录下慢日志，并且只返回已经扫描的数据（80201010中的3691064），此种情况，验证了猜想是正确的。

5.2 执行中被Kill

当然，被Kill的时候（新打开一个connection去kill即可），已执行的时间一定要大于自定义的慢查询时间阈值。

# Time：2019-06-06T19:12:10.553322+08:00
# User@Host uwser[uwser] @ [XX.XX.XX.XX] Id: 842366
# Query_time : 50.662233 Lock_time: 0.000456 Rows_sent 10121006 Rows_examined: 10121006 
SET timestamp .....;
select * from product_stock