MongoDBコレクションからドキュメントのランダムサンプルを返す3つの方法

コレクションからランダムなドキュメントの小さなサンプルを返す必要がある場合は、集計パイプラインを使用して試すことができる3つのアプローチを次に示します。

`$sample` ステージ

$sample 集約パイプラインステージは、指定された数のドキュメントをランダムに選択するために特別に設計されています。

$sampleを使用する場合、sizeで返すドキュメントの数を指定しますフィールド。

petsという次のコレクションがあるとします。：

{ "_id" : 1, "name" : "Wag", "type" : "Dog", "weight" : 20 }
{ "_id" : 2, "name" : "Bark", "type" : "Dog", "weight" : 10 }
{ "_id" : 3, "name" : "Meow", "type" : "Cat", "weight" : 7 }
{ "_id" : 4, "name" : "Scratch", "type" : "Cat", "weight" : 8 }
{ "_id" : 5, "name" : "Bruce", "type" : "Bat", "weight" : 3 }
{ "_id" : 6, "name" : "Hop", "type" : "Kangaroo", "weight" : 130 }
{ "_id" : 7, "name" : "Punch", "type" : "Gorilla", "weight" : 300 }
{ "_id" : 8, "name" : "Snap", "type" : "Crocodile", "weight" : 400 }
{ "_id" : 9, "name" : "Flutter", "type" : "Hummingbird", "weight" : 1 }

$sampleを使用できます次のようなドキュメントのランダムサンプルを取得するには：

db.pets.aggregate(
   [
      { 
        $sample: { size: 3 } 
      }
   ]
)

結果：

{ "_id" : 1, "name" : "Wag", "type" : "Dog", "weight" : 20 }
{ "_id" : 5, "name" : "Bruce", "type" : "Bat", "weight" : 3 }
{ "_id" : 3, "name" : "Meow", "type" : "Cat", "weight" : 7 }

この場合、{ size: 3 }を指定しました 3つのドキュメントを返しました。

ここでも、異なるサンプルサイズを使用しています：

db.pets.aggregate(
   [
      { 
        $sample: { size: 5 } 
      }
   ]
)

結果：

{ "_id" : 6, "name" : "Hop", "type" : "Kangaroo", "weight" : 130 }
{ "_id" : 5, "name" : "Bruce", "type" : "Bat", "weight" : 3 }
{ "_id" : 8, "name" : "Snap", "type" : "Crocodile", "weight" : 400 }
{ "_id" : 7, "name" : "Punch", "type" : "Gorilla", "weight" : 300 }
{ "_id" : 4, "name" : "Scratch", "type" : "Cat", "weight" : 8 }

$sample ステージは、コレクション内のドキュメント数、コレクション内のドキュメント数に対するサンプルサイズ、およびパイプライン内の位置に応じて、2つの方法のいずれかで機能します。 MongoDB $sampleを参照してくださいそれがどのように機能するかの説明のために。

$sampleも可能ですステージは、結果セットで同じドキュメントを複数回返す可能性があります。

`$rand` オペレーター

$rand 演算子はMongoDB4.4.2で導入されました。その目的は、呼び出されるたびに0から1の間のランダムなフロートを返すことです。

したがって、$matchで使用できます $exprなどの他の演算子と組み合わせてステージングするおよび$lt ドキュメントのランダムサンプルを返します。

例：

db.pets.aggregate(
   [
      { 
        $match: 
          { 
            $expr: 
              { 
                $lt: [ 0.5, { $rand: {} } ] 
              }
          } 
      }
   ]
)

結果：

{ "_id" : 3, "name" : "Meow", "type" : "Cat", "weight" : 7 }
{ "_id" : 4, "name" : "Scratch", "type" : "Cat", "weight" : 8 }
{ "_id" : 6, "name" : "Hop", "type" : "Kangaroo", "weight" : 130 }
{ "_id" : 9, "name" : "Flutter", "type" : "Hummingbird", "weight" : 1 }

このアプローチの結果セットは、$sampleとは異なります。固定数のドキュメントを返さないという点で、アプローチ。このアプローチで返されるドキュメントの数はさまざまです。

たとえば、同じコードをさらに数回実行すると、次のようになります。

結果セット2：

{ "_id" : 1, "name" : "Wag", "type" : "Dog", "weight" : 20 }
{ "_id" : 7, "name" : "Punch", "type" : "Gorilla", "weight" : 300 }
{ "_id" : 8, "name" : "Snap", "type" : "Crocodile", "weight" : 400 }

結果セット3：

{ "_id" : 2, "name" : "Bark", "type" : "Dog", "weight" : 10 }
{ "_id" : 4, "name" : "Scratch", "type" : "Cat", "weight" : 8 }
{ "_id" : 9, "name" : "Flutter", "type" : "Hummingbird", "weight" : 1 }

結果セット4：

{ "_id" : 1, "name" : "Wag", "type" : "Dog", "weight" : 20 }
{ "_id" : 3, "name" : "Meow", "type" : "Cat", "weight" : 7 }
{ "_id" : 6, "name" : "Hop", "type" : "Kangaroo", "weight" : 130 }
{ "_id" : 8, "name" : "Snap", "type" : "Crocodile", "weight" : 400 }

結果セット5：

{ "_id" : 1, "name" : "Wag", "type" : "Dog", "weight" : 20 }
{ "_id" : 4, "name" : "Scratch", "type" : "Cat", "weight" : 8 }
{ "_id" : 7, "name" : "Punch", "type" : "Gorilla", "weight" : 300 }
{ "_id" : 8, "name" : "Snap", "type" : "Crocodile", "weight" : 400 }
{ "_id" : 9, "name" : "Flutter", "type" : "Hummingbird", "weight" : 1 }

`$sampleRate` オペレーター

MongoDB 4.4.2で導入された、$sampleRate 演算子は、前の例と同じことを行うためのより簡潔な方法を提供します。

$sampleRateを使用する場合、0間の浮動小数点数としてサンプルレートを指定しますおよび1 。選択プロセスでは均一なランダム分布が使用され、提供するサンプルレートは、特定のドキュメントがパイプラインを通過するときに選択される確率を表します。

例：

db.pets.aggregate(
   [
      { 
        $match: { $sampleRate: 0.5 } 
      }
   ]
)

結果：

{ "_id" : 1, "name" : "Wag", "type" : "Dog", "weight" : 20 }
{ "_id" : 2, "name" : "Bark", "type" : "Dog", "weight" : 10 }
{ "_id" : 5, "name" : "Bruce", "type" : "Bat", "weight" : 3 }
{ "_id" : 6, "name" : "Hop", "type" : "Kangaroo", "weight" : 130 }
{ "_id" : 7, "name" : "Punch", "type" : "Gorilla", "weight" : 300 }
{ "_id" : 8, "name" : "Snap", "type" : "Crocodile", "weight" : 400 }

そしてもう一度実行します：

{ "_id" : 3, "name" : "Meow", "type" : "Cat", "weight" : 7 }
{ "_id" : 4, "name" : "Scratch", "type" : "Cat", "weight" : 8 }
{ "_id" : 7, "name" : "Punch", "type" : "Gorilla", "weight" : 300 }
{ "_id" : 8, "name" : "Snap", "type" : "Crocodile", "weight" : 400 }
{ "_id" : 9, "name" : "Flutter", "type" : "Hummingbird", "weight" : 1 }

そして再び：

{ "_id" : 1, "name" : "Wag", "type" : "Dog", "weight" : 20 }
{ "_id" : 2, "name" : "Bark", "type" : "Dog", "weight" : 10 }
{ "_id" : 3, "name" : "Meow", "type" : "Cat", "weight" : 7 }
{ "_id" : 8, "name" : "Snap", "type" : "Crocodile", "weight" : 400 }

MongoDBコレクションからドキュメントのランダムサンプルを返す3つの方法

$sample ステージ

$rand オペレーター

$sampleRate オペレーター

`$sample` ステージ

`$rand` オペレーター

`$sampleRate` オペレーター