HAVING句でエイリアスを使用できるようにすることのパフォーマンスへの影響

その特定のクエリだけに焦点を絞り、サンプルデータを以下にロードします。これは、count(distinct ...)などの他のクエリにも対応しています。他の人が言及した。

HAVINGのalias in the HAVING （クエリに応じて）その代替案よりもわずかに優れているか、かなり優れているように見えます。

これは、この回答を介してすばやく作成された約500万行の既存のテーブルを使用します私の場合は3〜5分かかります。

結果の構造：

CREATE TABLE `ratings` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `thing` int(11) NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=5046214 DEFAULT CHARSET=utf8;

ただし、代わりにINNODBを使用します。範囲予約の挿入により、予想されるINNODBギャップ異常が作成されます。言っているだけですが、違いはありません。 470万行。

ティムの想定スキーマに近づくようにテーブルを変更します。

rename table ratings to students; -- not exactly instanteous (a COPY)
alter table students add column camId int; -- get it near Tim's schema
-- don't add the `camId` index yet

以下はしばらく時間がかかります。チャンクで何度も実行しないと、接続がタイムアウトする可能性があります。タイムアウトは、updateステートメントにLIMIT句がない500万行が原因です。注： LIMIT句があります。

したがって、50万行の反復でそれを実行しています。列を1〜20のランダムな数値に設定します

update students set camId=floor(rand()*20+1) where camId is null limit 500000; -- well that took a while (no surprise)

camIdがなくなるまで、上記を実行し続けます nullです。

10回ほど実行しました（全体で7〜10分かかります）

select camId,count(*) from students
group by camId order by 1 ;

1   235641
2   236060
3   236249
4   235736
5   236333
6   235540
7   235870
8   236815
9   235950
10  235594
11  236504
12  236483
13  235656
14  236264
15  236050
16  236176
17  236097
18  235239
19  235556
20  234779

select count(*) from students;
-- 4.7 Million rows

有用なインデックスを作成します（もちろん挿入後）。

create index `ix_stu_cam` on students(camId); -- takes 45 seconds

ANALYZE TABLE students; -- update the stats: http://dev.mysql.com/doc/refman/5.7/en/analyze-table.html
-- the above is fine, takes 1 second

キャンパステーブルを作成します。

create table campus
(   camID int auto_increment primary key,
    camName varchar(100) not null
);
insert campus(camName) values
('one'),('2'),('3'),('4'),('5'),
('6'),('7'),('8'),('9'),('ten'),
('etc'),('etc'),('etc'),('etc'),('etc'),
('etc'),('etc'),('etc'),('etc'),('twenty');
-- ok 20 of them

2つのクエリを実行します：

SELECT students.camID, campus.camName, COUNT(students.id) as studentCount 
FROM students 
JOIN campus 
    ON campus.camID = students.camID 
GROUP BY students.camID, campus.camName 
HAVING COUNT(students.id) > 3 
ORDER BY studentCount; 
-- run it many many times, back to back, 5.50 seconds, 20 rows of output

および

SELECT students.camID, campus.camName, COUNT(students.id) as studentCount 
FROM students 
JOIN campus 
    ON campus.camID = students.camID 
GROUP BY students.camID, campus.camName 
HAVING studentCount > 3 
ORDER BY studentCount; 
-- run it many many times, back to back, 5.50 seconds, 20 rows of output

したがって、時間は同じです。それぞれを12回実行しました。

EXPLAIN 出力は両方で同じです

+----+-------------+----------+------+---------------+------------+---------+----------------------+--------+---------------------------------+
| id | select_type | table    | type | possible_keys | key        | key_len | ref                  | rows   | Extra                           |
+----+-------------+----------+------+---------------+------------+---------+----------------------+--------+---------------------------------+
|  1 | SIMPLE      | campus   | ALL  | PRIMARY       | NULL       | NULL    | NULL                 |     20 | Using temporary; Using filesort |
|  1 | SIMPLE      | students | ref  | ix_stu_cam    | ix_stu_cam | 5       | bigtest.campus.camID | 123766 | Using index                     |
+----+-------------+----------+------+---------------+------------+---------+----------------------+--------+---------------------------------+

AVG（）関数を使用すると、havingのエイリアスでパフォーマンスが約12％向上します。（同一の EXPLAIN 出力）次の2つのクエリから。

SELECT students.camID, campus.camName, avg(students.id) as studentAvg 
FROM students 
JOIN campus 
    ON campus.camID = students.camID 
GROUP BY students.camID, campus.camName 
HAVING avg(students.id) > 2200000 
ORDER BY students.camID; 
-- avg time 7.5

explain 

SELECT students.camID, campus.camName, avg(students.id) as studentAvg 
FROM students 
JOIN campus 
    ON campus.camID = students.camID 
GROUP BY students.camID, campus.camName 
HAVING studentAvg > 2200000
ORDER BY students.camID;
-- avg time 6.5

そして最後に、DISTINCT ：

SELECT students.camID, count(distinct students.id) as studentDistinct 
FROM students 
JOIN campus 
    ON campus.camID = students.camID 
GROUP BY students.camID 
HAVING count(distinct students.id) > 1000000 
ORDER BY students.camID; -- 10.6   10.84   12.1   11.49   10.1   9.97   10.27   11.53   9.84 9.98
-- 9.9

 SELECT students.camID, count(distinct students.id) as studentDistinct 
 FROM students 
 JOIN campus 
    ON campus.camID = students.camID 
 GROUP BY students.camID 
 HAVING studentDistinct > 1000000 
 ORDER BY students.camID; -- 6.81    6.55   6.75   6.31   7.11 6.36   6.55
-- 6.45

のエイリアスは一貫して35％高速 同じ EXPLAIN 出力。以下を参照してください。したがって、同じExplain出力は、同じパフォーマンスではなく、一般的な手がかりとして2回示されています。

+----+-------------+----------+-------+---------------+------------+---------+----------------------+--------+----------------------------------------------+
| id | select_type | table    | type  | possible_keys | key        | key_len | ref                  | rows   | Extra                                        |
+----+-------------+----------+-------+---------------+------------+---------+----------------------+--------+----------------------------------------------+
|  1 | SIMPLE      | campus   | index | PRIMARY       | PRIMARY    | 4       | NULL                 |     20 | Using index; Using temporary; Using filesort |
|  1 | SIMPLE      | students | ref   | ix_stu_cam    | ix_stu_cam | 5       | bigtest.campus.camID | 123766 | Using index                                  |
+----+-------------+----------+-------+---------------+------------+---------+----------------------+--------+----------------------------------------------+

オプティマイザーは、特にDISTINCT.の場合、現時点でエイリアスを優先しているように見えます。