MySQLとNoSQL：適切なものを選択するのを手伝ってください

以下を読んで、適切に設計されたinnodbテーブルの利点と、クラスター化インデックスの最適な使用方法について少し学ぶ必要があります。innodbでのみ使用できます！

http://dev.mysql.com/doc /refman/5.0/en/innodb-index-types.html

http：//www。 xaprb.com/blog/2006/07/04/how-to-exploit-mysql-index-optimizations/

次に、次の簡略化された例に沿ってシステムを設計します。

スキーマの例（簡略化）

重要な機能は、テーブルがinnodbエンジンを使用し、スレッドテーブルの主キーが単一のauto_incrementingキーではなく、複合クラスター化であることです。 forum_idとthread_idの組み合わせに基づくキー。例：

threads - primary key (forum_id, thread_id)

forum_id    thread_id
========    =========
1                   1
1                   2
1                   3
1                 ...
1             2058300  
2                   1
2                   2
2                   3
2                  ...
2              2352141
...

各フォーラム行には、next_thread_id（unsigned int）と呼ばれるカウンターが含まれています。このカウンターは、トリガーによって維持され、特定のフォーラムにスレッドが追加されるたびに増分します。これは、thread_idに単一のauto_increment主キーを使用する場合、合計で40億スレッドではなく、フォーラムごとに40億スレッドを格納できることも意味します。

forum_id    title   next_thread_id
========    =====   ==============
1          forum 1        2058300
2          forum 2        2352141
3          forum 3        2482805
4          forum 4        3740957
...
64        forum 64       3243097
65        forum 65      15000000 -- ooh a big one
66        forum 66       5038900
67        forum 67       4449764
...
247      forum 247            0 -- still loading data for half the forums !
248      forum 248            0
249      forum 249            0
250      forum 250            0

複合キーを使用することの欠点は、次のように単一のキー値でスレッドを選択することができなくなることです。

select * from threads where thread_id = y;

あなたがしなければならないこと：

select * from threads where forum_id = x and thread_id = y;

ただし、アプリケーションコードは、ユーザーが閲覧しているフォーラムを認識している必要があるため、実装はそれほど難しくありません。現在表示されているforum_idをセッション変数や非表示のフォームフィールドなどに保存します...

簡略化されたスキーマは次のとおりです。

drop table if exists forums;
create table forums
(
forum_id smallint unsigned not null auto_increment primary key,
title varchar(255) unique not null,
next_thread_id int unsigned not null default 0 -- count of threads in each forum
)engine=innodb;


drop table if exists threads;
create table threads
(
forum_id smallint unsigned not null,
thread_id int unsigned not null default 0,
reply_count int unsigned not null default 0,
hash char(32) not null,
created_date datetime not null,
primary key (forum_id, thread_id, reply_count) -- composite clustered index
)engine=innodb;

delimiter #

create trigger threads_before_ins_trig before insert on threads
for each row
begin
declare v_id int unsigned default 0;

  select next_thread_id + 1 into v_id from forums where forum_id = new.forum_id;
  set new.thread_id = v_id;
  update forums set next_thread_id = v_id where forum_id = new.forum_id;
end#

delimiter ;

（forum_id、thread_id）コンポジットはそれ自体が一意であるため、主キーの一部としてreply_countを含めたことにお気づきかもしれません。これは単なるインデックスの最適化であり、reply_countを使用するクエリが実行されるときにI/Oを節約します。詳細については、上記の2つのリンクを参照してください。

クエリの例

私はまだサンプルテーブルにデータをロードしていますが、これまでのところ、約5億行（システムの半分）。ロードプロセスが完了すると、およそ次のようになります。

250 forums * 5 million threads = 1250 000 000 (1.2 billion rows)

たとえば、フォーラムの一部に500万を超えるスレッドを含めるように意図的に作成しました。たとえば、フォーラム65には1500万のスレッドがあります。

forum_id    title   next_thread_id
========    =====   ==============
65        forum 65      15000000 -- ooh a big one

クエリランタイム

select sum(next_thread_id) from forums;

sum(next_thread_id)
===================
539,155,433 (500 million threads so far and still growing...)

innodbでは、next_thread_idsを合計して合計スレッド数を算出すると、通常よりもはるかに高速になります。

select count(*) from threads;

フォーラム65にはスレッドがいくつありますか：

select next_thread_id from forums where forum_id = 65

next_thread_id
==============
15,000,000 (15 million)

繰り返しますが、これは通常よりも高速です：

select count(*) from threads where forum_id = 65

これまでに約5億のスレッドがあり、フォーラム65には1500万のスレッドがあることがわかりました。スキーマのパフォーマンスを見てみましょう:)

select forum_id, thread_id from threads where forum_id = 65 and reply_count > 64 order by thread_id desc limit 32;

runtime = 0.022 secs

select forum_id, thread_id from threads where forum_id = 65 and reply_count > 1 order by thread_id desc limit 10000, 100;

runtime = 0.027 secs

私にはかなりパフォーマンスが高いように見えます-これは、5億行以上（および増加中）の単一のテーブルであり、0.02秒で1500万行をカバーするクエリです（負荷がかかっている間！）

さらなる最適化

これらには以下が含まれます：

範囲による分割
シャーディング
それにお金とハードウェアを投げる

など...

この回答がお役に立てば幸いです:)