第23期：索引设计（组合索引不适用场景改造）

2021-03-10发布 930浏览

关键字：MySQL数据库迁移、MySQL数据库设计、MySQL数据库调优

回顾下组合索引的语法和必备条件

【组合索引的语法】（只讨论默认升序排列）

alter table t1 add idx_multi(f1, f2, f3);

【必备条件】列 f1 必须存在于 SQL 语句过滤条件中！

也就是说组合索引的第一个列（最左列）在过滤条件中必须存在，而且最好是等值过滤。看看下面这些 SQL，没有一款适合组合索引。

# SQL 1
select * from t1 where f2 = 1;

# SQL 2
select * from t1 where f3 = 1;

# SQL 3
select * from t1 where f2 = 1 and f3 = 1;

# SQL 4
select * from t1 where f2 = 1 or f3 = 1;

# SQL 5
select * from t1 where f1 = 1 or f2 = 1;

# SQL 6
select * from t1 where f1 = 1 or f3 = 1;

# SQL 7
select * from t1 where f1 = 1 or f2 = 1 or f3 = 1;

那接下来对上面的 SQL 一个一个分析：

SQL 1 过滤条件只有一列 f2，是组合索引 idx_multi 的第二列，由于通过列 f2 找不到索引的入口，没法直接使用索引，针对这样的语句只有在列 f2 上建一个单值索引。

(127.0.0.1:3400)|(ytt)>alter table t1 add key idx_f2(f2);
Query OK, 0 rows affected (0.32 sec)
Records: 0  Duplicates: 0  Warnings: 0

加索引前后的执行计划对比，

加索引 idx_f2 之前，全表扫描，没有任何可用的索引。

(127.0.0.1:3400)|(ytt)>explain select * from t1 where f2 = 1\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t1
   partitions: NULL
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 32194
     filtered: 10.00
        Extra: Using where
1 row in set, 1 warning (0.00 sec)

加了索引 idx_f2 之后，直接通过索引 idx_f2 来过滤记录，

(127.0.0.1:3400)|(ytt)>explain select * from t1 where f2 = 1\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t1
   partitions: NULL
         type: ref
possible_keys: idx_f2
          key: idx_f2
      key_len: 5
          ref: const
         rows: 351
     filtered: 100.00
        Extra: NULL
1 row in set, 1 warning (0.00 sec)

SQL 2 和 SQL 1 一样，可以对列 f3 加一个单值索引。

(127.0.0.1:3400)|(ytt)>alter table t1 add key idx_f3(f3);
Query OK, 0 rows affected (0.35 sec)
Records: 0  Duplicates: 0  Warnings: 0

接下来看 SQL 3，SQL 3 的过滤条件为列 (f2, f3)，并且为等值过滤。也不符合组合索引 idx_multi 的必备特征，列 f1 没有被包含到过滤条件里。不过之前对列 f2 和列 f3 分别建立了单值索引，可以利用 MySQL 的 Index merge 特性来合并这两个索引取交集，查询计划如下：

(127.0.0.1:3400)|(ytt)>explain select * from t1 where f2 = 1 and f3 = 1\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t1
   partitions: NULL
         type: index_merge
possible_keys: idx_f2,idx_f3
          key: idx_f3,idx_f2
      key_len: 5,5
          ref: NULL
         rows: 3
     filtered: 100.00
        Extra: Using intersect(idx_f3,idx_f2); Using where
1 row in set, 1 warning (0.00 sec)

可以看到，MySQL 用了 Index_merge 来过滤数据。不过如果这样的语句出现很频繁的话，建议还是加一个仅包含列 f2,f3 的组合索引。

(127.0.0.1:3400)|(ytt)>alter table t1 add key idx_multi_sub(f2,f3);
Query OK, 0 rows affected (0.50 sec)
Records: 0  Duplicates: 0  Warnings: 0

执行计划如下：

(127.0.0.1:3400)|(ytt)>explain select * from t1 where f2 = 1 and f3 = 1\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t1
   partitions: NULL
         type: ref
possible_keys: idx_f2,idx_f3,idx_multi_sub
          key: idx_multi_sub
      key_len: 10
          ref: const,const
         rows: 2
     filtered: 100.00
        Extra: NULL
1 row in set, 1 warning (0.00 sec)

可以看到，MySQL 选择了索引 idx_multi_sub 来过滤数据，而且 type 类型为 ref 相比 index_merge 来的更优化。

再来看看 SQL 4，SQL 4 的过滤条件也是列 f2,f3，不过两列的关系不是 AND，而是 OR，并且也不包含列 f1，那这样完全不符合组合索引的特征。

基于这样的语句，如果给定的过滤条件 f2, f3 过滤性能还可以的话(之后我会重新开篇讲索引的过滤性能)，完全可以给两列分别建单值索引，MySQL 会对这样的查询用 index_merge 对两个单值索引做一个合并过滤，之前已经建过了，看看执行计划：

(127.0.0.1:3400)|(ytt)>explain select * from t1 where f2 = 1 or f3 = 1\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t1
   partitions: NULL
         type: index_merge
possible_keys: idx_f2,idx_f3,idx_multi_sub
          key: idx_f2,idx_f3
      key_len: 5,5
          ref: NULL
         rows: 684
     filtered: 100.00
        Extra: Using union(idx_f2,idx_f3); Using where
1 row in set, 1 warning (0.00 sec)

有人说，把 OR 改成 UNION ALL 可以避免这种操作，不过也要分场合，比如上面这条 SQL，index_merge 只是简单对索引 idx_f2、idx_f3 做了一个 union ，没有排序，这时候 OR 和 UNION ALL 性能差不了多少。至于更多的 OR 和 UNION ALL 的优化会另外开篇，不在本篇范围之内，就不多做介绍了。

SQL 5、SQL 6 这两条 SQL 完全可以用到组合索引 idx_multi，因为列 f1 包含在过滤条件里，不过由于其他的条件不是 and 而是 or，所以也需要给 or 后的列加单值索引，这样 MySQL 会把索引 idx_multi 和其他单值索引做一个 index_merge 来优化。来分别看看这两条 SQL 的执行计划：

(127.0.0.1:3400)|(ytt)>explain select * from t1 where f1 = 1 or f2 = 1\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t1
   partitions: NULL
         type: index_merge
possible_keys: idx_multi,idx_f2,idx_multi_sub
          key: idx_multi,idx_f2
      key_len: 5,5
          ref: NULL
         rows: 663
     filtered: 100.00
        Extra: Using sort_union(idx_multi,idx_f2); Using where
1 row in set, 1 warning (0.01 sec)

(127.0.0.1:3400)|(ytt)>explain select * from t1 where f1 = 1 or f3 = 1\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t1
   partitions: NULL
         type: index_merge
possible_keys: idx_multi,idx_f3
          key: idx_multi,idx_f3
      key_len: 5,5
          ref: NULL
         rows: 645
     filtered: 100.00
        Extra: Using sort_union(idx_multi,idx_f3); Using where
1 row in set, 1 warning (0.00 sec)

可以看到，MySQL同时用到了索引 idx_multi、idx_f2 和 idx_multi、idx_f3。但是两个执行计划的 index_merge 都用 sort_union ，能不能把 sort_union 变为普通的非排序的 union 呢？这时候就要给列 f1 加单值索引，

(127.0.0.1:3400)|(ytt)>alter table t1 add key idx_f1(f1);
Query OK, 0 rows affected (0.27 sec)
Records: 0  Duplicates: 0  Warnings: 0

来随机看下 SQL 5 的执行计划：

(127.0.0.1:3400)|(ytt)>explain select * from t1 where f1 = 1 or f2 = 1\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t1
   partitions: NULL
         type: index_merge
possible_keys: idx_multi,idx_f2,idx_multi_sub,idx_f1
          key: idx_f1,idx_f2
      key_len: 5,5
          ref: NULL
         rows: 663
     filtered: 100.00
        Extra: Using union(idx_f1,idx_f2); Using where
1 row in set, 1 warning (0.00 sec)

给列 f1 加了单列索引后，index_merge 就避免了sort_union。

SQL 7 和上面两条 SQL 一样，得单独用到列 f1 的单值索引，不同的是 index_merge 是对三个列进行而不是两个列。执行计划如下：

(127.0.0.1:3400)|(ytt)>explain
    -> select * from t1 where f1 = 1 or f2 = 1 or f3 = 1\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t1
   partitions: NULL
         type: index_merge
possible_keys: idx_multi,idx_f2,idx_f3,idx_multi_sub,idx_f1
          key: idx_f1,idx_f2,idx_f3
      key_len: 5,5,5
          ref: NULL
         rows: 996
     filtered: 100.00
        Extra: Using union(idx_f1,idx_f2,idx_f3); Using where
1 row in set, 1 warning (0.00 sec)

可以看到可以很清晰的看到，MySQL 用了三个单值索引的 index_merge。

其实到现在表 t1 已经有很多索引了，

(127.0.0.1:3400)|(ytt)>show create table t1\G
*************************** 1. row ***************************
       Table: t1
Create Table: CREATE TABLE `t1` (
  `id` int NOT NULL,
  `f1` int DEFAULT NULL,
  `f2` int DEFAULT NULL,
  `f3` int DEFAULT NULL,
  `f4` int DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `idx_multi` (`f1`,`f2`,`f3`),
  KEY `idx_f2` (`f2`),
  KEY `idx_f3` (`f3`),
  KEY `idx_multi_sub` (`f2`,`f3`),
  KEY `idx_f1` (`f1`)
) ENGINE=InnoDB  mb4 COLLATE=utf8mb4_0900_ai_ci
1 row in set (0.00 sec)

从以上表结构可以看到，对于我们这 7 条 SQL，除了主键，还多建了 5 个二级索引。实际场景中也不是非得把这些索引全给建上，主要还得看这条语句的运行频率，如果以上哪条 SQL 运行频率较低并且允许时间与业务高峰期不冲突，可以移除这条 SQL 对应的索引来减少索引对表写入的延迟。

回顾下组合索引的语法和必备条件

【组合索引的语法】（只讨论默认升序排列）

alter table t1 add idx_multi(f1, f2, f3);

【必备条件】列 f1 必须存在于 SQL 语句过滤条件中！

也就是说组合索引的第一个列（最左列）在过滤条件中必须存在，而且最好是等值过滤。看看下面这些 SQL，没有一款适合组合索引。

# SQL 1
select * from t1 where f2 = 1;

# SQL 2
select * from t1 where f3 = 1;

# SQL 3
select * from t1 where f2 = 1 and f3 = 1;

# SQL 4
select * from t1 where f2 = 1 or f3 = 1;

# SQL 5
select * from t1 where f1 = 1 or f2 = 1;

# SQL 6
select * from t1 where f1 = 1 or f3 = 1;

# SQL 7
select * from t1 where f1 = 1 or f2 = 1 or f3 = 1;

那接下来对上面的 SQL 一个一个分析：

(127.0.0.1:3400)|(ytt)>alter table t1 add key idx_f2(f2);
Query OK, 0 rows affected (0.32 sec)
Records: 0  Duplicates: 0  Warnings: 0

加索引前后的执行计划对比，

加索引 idx_f2 之前，全表扫描，没有任何可用的索引。

(127.0.0.1:3400)|(ytt)>explain select * from t1 where f2 = 1\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t1
   partitions: NULL
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 32194
     filtered: 10.00
        Extra: Using where
1 row in set, 1 warning (0.00 sec)

加了索引 idx_f2 之后，直接通过索引 idx_f2 来过滤记录，

(127.0.0.1:3400)|(ytt)>explain select * from t1 where f2 = 1\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t1
   partitions: NULL
         type: ref
possible_keys: idx_f2
          key: idx_f2
      key_len: 5
          ref: const
         rows: 351
     filtered: 100.00
        Extra: NULL
1 row in set, 1 warning (0.00 sec)

SQL 2 和 SQL 1 一样，可以对列 f3 加一个单值索引。

(127.0.0.1:3400)|(ytt)>alter table t1 add key idx_f3(f3);
Query OK, 0 rows affected (0.35 sec)
Records: 0  Duplicates: 0  Warnings: 0

(127.0.0.1:3400)|(ytt)>explain select * from t1 where f2 = 1 and f3 = 1\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t1
   partitions: NULL
         type: index_merge
possible_keys: idx_f2,idx_f3
          key: idx_f3,idx_f2
      key_len: 5,5
          ref: NULL
         rows: 3
     filtered: 100.00
        Extra: Using intersect(idx_f3,idx_f2); Using where
1 row in set, 1 warning (0.00 sec)

可以看到，MySQL 用了 Index_merge 来过滤数据。不过如果这样的语句出现很频繁的话，建议还是加一个仅包含列 f2,f3 的组合索引。

(127.0.0.1:3400)|(ytt)>alter table t1 add key idx_multi_sub(f2,f3);
Query OK, 0 rows affected (0.50 sec)
Records: 0  Duplicates: 0  Warnings: 0

执行计划如下：

(127.0.0.1:3400)|(ytt)>explain select * from t1 where f2 = 1 and f3 = 1\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t1
   partitions: NULL
         type: ref
possible_keys: idx_f2,idx_f3,idx_multi_sub
          key: idx_multi_sub
      key_len: 10
          ref: const,const
         rows: 2
     filtered: 100.00
        Extra: NULL
1 row in set, 1 warning (0.00 sec)

可以看到，MySQL 选择了索引 idx_multi_sub 来过滤数据，而且 type 类型为 ref 相比 index_merge 来的更优化。

再来看看 SQL 4，SQL 4 的过滤条件也是列 f2,f3，不过两列的关系不是 AND，而是 OR，并且也不包含列 f1，那这样完全不符合组合索引的特征。

(127.0.0.1:3400)|(ytt)>explain select * from t1 where f2 = 1 or f3 = 1\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t1
   partitions: NULL
         type: index_merge
possible_keys: idx_f2,idx_f3,idx_multi_sub
          key: idx_f2,idx_f3
      key_len: 5,5
          ref: NULL
         rows: 684
     filtered: 100.00
        Extra: Using union(idx_f2,idx_f3); Using where
1 row in set, 1 warning (0.00 sec)

(127.0.0.1:3400)|(ytt)>explain select * from t1 where f1 = 1 or f2 = 1\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t1
   partitions: NULL
         type: index_merge
possible_keys: idx_multi,idx_f2,idx_multi_sub
          key: idx_multi,idx_f2
      key_len: 5,5
          ref: NULL
         rows: 663
     filtered: 100.00
        Extra: Using sort_union(idx_multi,idx_f2); Using where
1 row in set, 1 warning (0.01 sec)

(127.0.0.1:3400)|(ytt)>explain select * from t1 where f1 = 1 or f3 = 1\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t1
   partitions: NULL
         type: index_merge
possible_keys: idx_multi,idx_f3
          key: idx_multi,idx_f3
      key_len: 5,5
          ref: NULL
         rows: 645
     filtered: 100.00
        Extra: Using sort_union(idx_multi,idx_f3); Using where
1 row in set, 1 warning (0.00 sec)

(127.0.0.1:3400)|(ytt)>alter table t1 add key idx_f1(f1);
Query OK, 0 rows affected (0.27 sec)
Records: 0  Duplicates: 0  Warnings: 0

来随机看下 SQL 5 的执行计划：

(127.0.0.1:3400)|(ytt)>explain select * from t1 where f1 = 1 or f2 = 1\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t1
   partitions: NULL
         type: index_merge
possible_keys: idx_multi,idx_f2,idx_multi_sub,idx_f1
          key: idx_f1,idx_f2
      key_len: 5,5
          ref: NULL
         rows: 663
     filtered: 100.00
        Extra: Using union(idx_f1,idx_f2); Using where
1 row in set, 1 warning (0.00 sec)

给列 f1 加了单列索引后，index_merge 就避免了sort_union。

SQL 7 和上面两条 SQL 一样，得单独用到列 f1 的单值索引，不同的是 index_merge 是对三个列进行而不是两个列。执行计划如下：

(127.0.0.1:3400)|(ytt)>explain
    -> select * from t1 where f1 = 1 or f2 = 1 or f3 = 1\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t1
   partitions: NULL
         type: index_merge
possible_keys: idx_multi,idx_f2,idx_f3,idx_multi_sub,idx_f1
          key: idx_f1,idx_f2,idx_f3
      key_len: 5,5,5
          ref: NULL
         rows: 996
     filtered: 100.00
        Extra: Using union(idx_f1,idx_f2,idx_f3); Using where
1 row in set, 1 warning (0.00 sec)

可以看到可以很清晰的看到，MySQL 用了三个单值索引的 index_merge。

其实到现在表 t1 已经有很多索引了，

(127.0.0.1:3400)|(ytt)>show create table t1\G
*************************** 1. row ***************************
       Table: t1
Create Table: CREATE TABLE `t1` (
  `id` int NOT NULL,
  `f1` int DEFAULT NULL,
  `f2` int DEFAULT NULL,
  `f3` int DEFAULT NULL,
  `f4` int DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `idx_multi` (`f1`,`f2`,`f3`),
  KEY `idx_f2` (`f2`),
  KEY `idx_f3` (`f3`),
  KEY `idx_multi_sub` (`f2`,`f3`),
  KEY `idx_f1` (`f1`)
) ENGINE=InnoDB  mb4 COLLATE=utf8mb4_0900_ai_ci
1 row in set (0.00 sec)

关键字：MySQL数据库迁移、MySQL数据库设计、MySQL数据库调优

技术分享 | XtraBackup 备份加速

故障分析 | Java 连接 MySQL 8.0 排错案例

2024-04-22 发布 636 浏览

爱可生携手OceanBase共建生态繁荣，2024开发者大会分享《ActionDB生态体系建设与实践经验》

2024-03-28 发布 784 浏览

36氪发布AIGC行业报告，爱可生荣登向量数据库领域代表企业榜单

2024-02-02 发布 1,039 浏览

爱可生荣获证券基金行业信息技术应用创新联盟年度优秀成员奖

2024-02-01 发布 1,330 浏览

爱可生获得OceanBase特别认可“客户第一”，2023年度重点项目交付合作再创佳绩

2024-01-31 发布 3,287 浏览

信通院云大所大数据和区块链部向爱可生致谢，共铸业内首个《向量数据库技术要求》