Elasticsearch on multiple fields with partial and full matches -


our account model has first_name, last_name , ssn (social security number).

i want partial matches on first_name,last_name' exact match on ssn. have far:

settings analysis: {     filter: {       substring: {         type: "ngram",         min_gram: 3,         max_gram: 50       },       ssn_string: {         type: "ngram",         min_gram: 9,         max_gram: 9       },     },     analyzer: {       index_ngram_analyzer: {         type: "custom",         tokenizer: "standard",         filter: ["lowercase", "substring"]       },       search_ngram_analyzer: {         type: "custom",         tokenizer: "standard",         filter:  ["lowercase", "substring"]       },       ssn_ngram_analyzer: {         type: "custom",         tokenizer: "standard",         filter: ["ssn_string"]       },      }    }     mapping     [:first_name, :last_name].each |attribute|       indexes attribute, type: 'string',                           index_analyzer: 'index_ngram_analyzer',                          search_analyzer: 'search_ngram_analyzer'    end     indexes :ssn, type: 'string', index: 'not_analyzed'    end  

my search follows:

query: {   multi_match: {      fields: ["first_name", "last_name", "ssn"],      query: query,      type: "cross_fields",      operator: "and"   } 

}

so works:

 account.search("erik").records.to_a 

and (for erik smith):

 account.search("erik smi").records.to_a 

and ssn:

 account.search("111112222").records.to_a 

but not:

 account.search("erik 111112222").records.to_a 

any idea if indexing or querying wrong?

thank help!

does have done single query string? if not, this:

put /test_index {    "settings": {       "number_of_shards": 1,       "analysis": {          "filter": {             "ngram_filter": {                "type": "ngram",                "min_gram": 2,                "max_gram": 20             }          },          "analyzer": {             "ngram_analyzer": {                "type": "custom",                "tokenizer": "standard",                "filter": [                   "lowercase",                   "ngram_filter"                ]             }          }       }    },    "mappings": {       "doc": {          "_all": {             "enabled": true,             "index_analyzer": "ngram_analyzer",             "search_analyzer": "standard"          },          "properties": {             "first_name": {                "type": "string",                "include_in_all": true             },             "last_name": {                "type": "string",                "include_in_all": true             },             "ssn": {                "type": "string",                "index": "not_analyzed",                "include_in_all": false             }          }       }    } } 

notice use of the_all field. included first_name , last_name in _all, not ssn, , ssn not analyzed @ since want exact matches against it.

i indexed couple of documents illustration:

post /test_index/doc/_bulk {"index":{"_id":1}} {"first_name":"erik","last_name":"smith","ssn":"111112222"} {"index":{"_id":2}} {"first_name":"bob","last_name":"jones","ssn":"123456789"} 

then can query partial names, , filter exact ssn:

post /test_index/doc/_search {    "query": {       "filtered": {          "query": {             "match": {                "_all": {                    "query": "eri smi",                    "operator": "and"                }             }          },          "filter": {             "term": {                "ssn": "111112222"             }          }       }    } } 

and i'm expecting:

{    "took": 2,    "timed_out": false,    "_shards": {       "total": 1,       "successful": 1,       "failed": 0    },    "hits": {       "total": 1,       "max_score": 0.8838835,       "hits": [          {             "_index": "test_index",             "_type": "doc",             "_id": "1",             "_score": 0.8838835,             "_source": {                "first_name": "erik",                "last_name": "smith",                "ssn": "111112222"             }          }       ]    } } 

if need able search single query string (no filter), include ssn in all field well, setup match on partial strings (like 111112) may not want.

if want match prefixes (i.e., search terms start @ beginning of words), should use edge ngrams.

i wrote blog post using ngrams might out little: http://blog.qbox.io/an-introduction-to-ngrams-in-elasticsearch

here code used answer. tried few different things, including setup posted here, , inluding ssn in _all, edge ngrams. hope helps:

http://sense.qbox.io/gist/b6a31c929945ef96779c72c468303ea3bc87320f


Comments

Popular posts from this blog

java - Custom OutputStreamAppender not run: LOGBACK: No context given for <MYAPPENDER> -

java - UML - How would you draw a try catch in a sequence diagram? -

c++ - No viable overloaded operator for references a map -