sql >> データベース >  >> RDS >> Mysql

重複する英語名を検出する

    インデックス作成中にfirstNameでSynonymFilterを使用して、すべての可能な組み合わせ(Bob-> Robert、Robert-> Bobなど)を使用します。既存のユーザーにインデックスを付けます。

    次に、QueryParser(アナライザーにSynonymFilterを含まない)を使用して、あいまいなクエリを実行します。

    これは私が思いついたコードです:

    public class NameDuplicateTests {
        private Analyzer analyzer;
        private IndexSearcher searcher;
        private IndexReader reader;
        private QueryParser qp;
    
        private final static Multimap<String, String> firstNameSynonyms;
        static {
            firstNameSynonyms = HashMultimap.create();
            List<String> robertSynonyms = ImmutableList.of("Bob", "Bobby", "Robert");
            for (String name: robertSynonyms) {
                firstNameSynonyms.putAll(name, robertSynonyms);
            }
            List<String> willSynonyms = ImmutableList.of("William", "Will", "Bill", "Billy");
            for (String name: willSynonyms) {
                firstNameSynonyms.putAll(name, willSynonyms);
            }
        }
    
        public static Analyzer createAnalyzer() {
            return new Analyzer() {
                @Override
                public TokenStream tokenStream(String fieldName, Reader reader) {
                    TokenStream tokenizer = new WhitespaceTokenizer(reader);
                    if (fieldName.equals("firstName")) {
                        tokenizer = new SynonymFilter(tokenizer, new SynonymEngine() {
                            @Override
                            public String[] getSynonyms(String s) throws IOException {
                                return firstNameSynonyms.get(s).toArray(new String[0]);
                            }
                        });
                    }
                    return tokenizer;
                }
            };
        }
    
    
        @Before
        public void setUp() throws Exception {
            Directory dir = new RAMDirectory();
            analyzer = createAnalyzer();
    
            IndexWriter writer = new IndexWriter(dir, analyzer, IndexWriter.MaxFieldLength.UNLIMITED);
            ImmutableList<String> firstNames = ImmutableList.of("William", "Robert", "Bobby", "Will", "Anton");
            ImmutableList<String> lastNames = ImmutableList.of("Robert", "Williams", "Mayor", "Bob", "FunkyMother");
    
            for (int id = 0; id < firstNames.size(); id++) {
                Document doc = new Document();
                doc.add(new Field("id", String.valueOf(id), Field.Store.YES, Field.Index.NOT_ANALYZED));
                doc.add(new Field("firstName", firstNames.get(id), Field.Store.YES, Field.Index.ANALYZED));
                doc.add(new Field("lastName", lastNames.get(id), Field.Store.YES, Field.Index.NOT_ANALYZED));
                writer.addDocument(doc);
            }
            writer.close();
    
            qp = new QueryParser(Version.LUCENE_30, "firstName", new WhitespaceAnalyzer());
            searcher = new IndexSearcher(dir);
            reader = searcher.getIndexReader();
        }
    
        @After
        public void tearDown() throws Exception {
            searcher.close();
        }
    
        @Test
        public void testNameFilter() throws Exception {
            search("+firstName:Bob +lastName:Williams");
            search("+firstName:Bob +lastName:Wolliam~");
        }
    
        private void search(String query) throws ParseException, IOException {
            Query q = qp.parse(query);
            System.out.println(q);
            TopDocs res = searcher.search(q, 3);
            for (ScoreDoc sd: res.scoreDocs) {
                Document doc = reader.document(sd.doc);
                System.out.println("Found " + doc.get("firstName") + " " + doc.get("lastName"));
            }
        }
    }
    

    結果は次のようになります:

    +firstName:Bob +lastName:Williams
    Found Robert Williams
    +firstName:Bob +lastName:wolliam~0.5
    Found Robert Williams
    

    お役に立てば幸いです。




    1. float LARAVELでメンバー関数addEagerConstraints()を呼び出す

    2. MySQLInnodbでのダングリングトランザクション

    3. SQLIN句をパラメータ化する

    4. PostgresqlDBバックアップ理想的な方法