mongodbで一致したOR条件のパーセンテージ

ソリューションは実際にはMongoDB固有である必要があります。そうしないと、クライアント側で計算と可能なマッチングを行うことになり、パフォーマンスに悪影響を及ぼします。

もちろん、本当に必要なのは、サーバー側でその処理を行う方法です。

db.products.aggregate([

    // Match the documents that meet your conditions
    { "$match": {
        "$or": [
            { 
                "features": { 
                    "$elemMatch": {
                       "key": "Screen Format",
                       "value": "16:9"
                    }
                }
            },
            { 
                "features": { 
                    "$elemMatch": {
                       "key" : "Weight in kg",
                       "value" : { "$gt": "5", "$lt": "8" }
                    }
                }
            },
        ]
    }},

    // Keep the document and a copy of the features array
    { "$project": {
        "_id": {
            "_id": "$_id",
            "product_id": "$product_id",
            "ean": "$ean",
            "brand": "$brand",
            "model": "$model",
            "features": "$features"
        },
        "features": 1
    }},

    // Unwind the array
    { "$unwind": "$features" },

    // Find the actual elements that match the conditions
    { "$match": {
        "$or": [
            { 
               "features.key": "Screen Format",
               "features.value": "16:9"
            },
            { 
               "features.key" : "Weight in kg",
               "features.value" : { "$gt": "5", "$lt": "8" }
            },
        ]
    }},

    // Count those matched elements
    { "$group": {
        "_id": "$_id",
        "count": { "$sum": 1 }
    }},

    // Restore the document and divide the mated elements by the
    // number of elements in the "or" condition
    { "$project": {
        "_id": "$_id._id",
        "product_id": "$_id.product_id",
        "ean": "$_id.ean",
        "brand": "$_id.brand",
        "model": "$_id.model",
        "features": "$_id.features",
        "matched": { "$divide": [ "$count", 2 ] }
    }},

    // Sort by the matched percentage
    { "$sort": { "matched": -1 } }

])

ご存知のように、$orの「長さ」 条件が適用されている場合は、「features」配列内の要素の数がそれらの条件に一致するかどうかを確認する必要があります。これが、パイプラインの2番目の$matchのすべてです。

その数を取得したら、$orとして渡された条件の数で割るだけです。 。ここでの利点は、関連性で並べ替えたり、結果サーバー側を「ページング」したりするなど、これを使用して便利なことを実行できることです。

もちろん、これをさらに「分類」したい場合は、別の $projectを追加するだけです。 パイプラインの最後までのステージ：

    { "$project": {
        "product_id": 1
        "ean": 1
        "brand": 1
        "model": 1,
        "features": 1,
        "matched": 1,
        "category": { "$cond": [
            { "$eq": [ "$matched", 1 ] },
            "100",
            { "$cond": [ 
                { "$gte": [ "$matched", .7 ] },
                "70-99",
                { "$cond": [
                   "$gte": [ "$matched", .4 ] },
                   "40-69",
                   "under 40"
                ]} 
            ]}
        ]}
    }}

または同様のものとして。しかし、 $cond オペレーターがここでお手伝いします。

機能配列のエントリの「キー」と「値」に複合インデックスを設定できるため、アーキテクチャは問題なく機能するはずです。これは、クエリに対して適切に拡張できるはずです。

もちろん、ファセット検索や結果など、実際にそれ以上のものが必要な場合は、SolrやElasticSearchなどのソリューションを検討できます。しかし、それを完全に実装するには、ここでは少し時間がかかります。