sql >> データベース >  >> RDS >> PostgreSQL

PostgreSQLクエリはインデックススキャンでより高速に実行されますが、エンジンはハッシュ結合を選択します

    私の推測では、デフォルトのrandom_page_cost = 4を使用していると思います 、これは高すぎるため、インデックススキャンのコストが高くなりすぎます。

    このスクリプトを使用して2つのテーブルを再構築しようとしています:

    CREATE TABLE replays_game (
        id integer NOT NULL,
        PRIMARY KEY (id)
    );
    
    CREATE TABLE replays_playeringame (
        player_id integer NOT NULL,
        game_id integer NOT NULL,
        PRIMARY KEY (player_id, game_id),
        CONSTRAINT replays_playeringame_game_fkey
            FOREIGN KEY (game_id) REFERENCES replays_game (id)
    );
    
    CREATE INDEX ix_replays_playeringame_game_id
        ON replays_playeringame (game_id);
    
    -- 150k games
    INSERT INTO replays_game
    SELECT generate_series(1, 150000);
    
    -- ~150k players, ~2 games each
    INSERT INTO replays_playeringame
    select trunc(random() * 149999 + 1), generate_series(1, 150000);
    
    INSERT INTO replays_playeringame
    SELECT *
    FROM
        (
            SELECT
                trunc(random() * 149999 + 1) as player_id,
                generate_series(1, 150000) as game_id
        ) AS t
    WHERE
        NOT EXISTS (
            SELECT 1
            FROM replays_playeringame
            WHERE
                t.player_id = replays_playeringame.player_id
                AND t.game_id = replays_playeringame.game_id
        )
    ;
    
    -- the heavy player with 3000 games
    INSERT INTO replays_playeringame
    select 999999, generate_series(1, 3000);
    

    デフォルト値は4:

    game=# set random_page_cost = 4;
    SET
    game=# explain analyse SELECT "replays_game".*
    FROM "replays_game"
    INNER JOIN "replays_playeringame" ON "replays_game"."id" = "replays_playeringame"."game_id"
    WHERE "replays_playeringame"."player_id" = 999999;
                                                                         QUERY PLAN                                                                      
    -----------------------------------------------------------------------------------------------------------------------------------------------------
     Hash Join  (cost=1483.54..4802.54 rows=3000 width=4) (actual time=3.640..110.212 rows=3000 loops=1)
       Hash Cond: (replays_game.id = replays_playeringame.game_id)
       ->  Seq Scan on replays_game  (cost=0.00..2164.00 rows=150000 width=4) (actual time=0.012..34.261 rows=150000 loops=1)
       ->  Hash  (cost=1446.04..1446.04 rows=3000 width=4) (actual time=3.598..3.598 rows=3000 loops=1)
             Buckets: 1024  Batches: 1  Memory Usage: 106kB
             ->  Bitmap Heap Scan on replays_playeringame  (cost=67.54..1446.04 rows=3000 width=4) (actual time=0.586..2.041 rows=3000 loops=1)
                   Recheck Cond: (player_id = 999999)
                   ->  Bitmap Index Scan on replays_playeringame_pkey  (cost=0.00..66.79 rows=3000 width=0) (actual time=0.560..0.560 rows=3000 loops=1)
                         Index Cond: (player_id = 999999)
     Total runtime: 110.621 ms
    

    2に下げた後:

    game=# set random_page_cost = 2;
    SET
    game=# explain analyse SELECT "replays_game".*
    FROM "replays_game"
    INNER JOIN "replays_playeringame" ON "replays_game"."id" = "replays_playeringame"."game_id"
    WHERE "replays_playeringame"."player_id" = 999999;
                                                                      QUERY PLAN                                                                   
    -----------------------------------------------------------------------------------------------------------------------------------------------
     Nested Loop  (cost=45.52..4444.86 rows=3000 width=4) (actual time=0.418..27.741 rows=3000 loops=1)
       ->  Bitmap Heap Scan on replays_playeringame  (cost=45.52..1424.02 rows=3000 width=4) (actual time=0.406..1.502 rows=3000 loops=1)
             Recheck Cond: (player_id = 999999)
             ->  Bitmap Index Scan on replays_playeringame_pkey  (cost=0.00..44.77 rows=3000 width=0) (actual time=0.388..0.388 rows=3000 loops=1)
                   Index Cond: (player_id = 999999)
       ->  Index Scan using replays_game_pkey on replays_game  (cost=0.00..0.99 rows=1 width=4) (actual time=0.006..0.006 rows=1 loops=3000)
             Index Cond: (id = replays_playeringame.game_id)
     Total runtime: 28.542 ms
    (8 rows)
    

    SSDを使用している場合は、さらに1.1に下げます。

    最後の質問ですが、postgresqlを使い続けるべきだと思います。私はpostgresqlとmssqlの経験があり、前者の半分を実行するには、後者に3倍の労力を費やす必要があります。



    1. Group By句を持つ—eleinのGeneralBits

    2. フィルター処理されたインデックスと強制パラメーター化(redux)

    3. CodeigniterでのWHERE句のグループ化

    4. MySQLで文字列を「アンヘックス」する3つの方法