Elasticsearch on multiple fields with partial and full matches -

- April 15, 2013

our account model has first_name, last_name , ssn (social security number).

i want partial matches on first_name,last_name' exact match on ssn. have far:

settings analysis: {     filter: {       substring: {         type: "ngram",         min_gram: 3,         max_gram: 50       },       ssn_string: {         type: "ngram",         min_gram: 9,         max_gram: 9       },     },     analyzer: {       index_ngram_analyzer: {         type: "custom",         tokenizer: "standard",         filter: ["lowercase", "substring"]       },       search_ngram_analyzer: {         type: "custom",         tokenizer: "standard",         filter:  ["lowercase", "substring"]       },       ssn_ngram_analyzer: {         type: "custom",         tokenizer: "standard",         filter: ["ssn_string"]       },      }    }     mapping     [:first_name, :last_name].each |attribute|       indexes attribute, type: 'string',                           index_analyzer: 'index_ngram_analyzer',                          search_analyzer: 'search_ngram_analyzer'    end     indexes :ssn, type: 'string', index: 'not_analyzed'    end

my search follows:

query: {   multi_match: {      fields: ["first_name", "last_name", "ssn"],      query: query,      type: "cross_fields",      operator: "and"   }

}

so works:

 account.search("erik").records.to_a

and (for erik smith):

 account.search("erik smi").records.to_a

and ssn:

 account.search("111112222").records.to_a

but not:

 account.search("erik 111112222").records.to_a

any idea if indexing or querying wrong?

thank help!

does have done single query string? if not, this:

put /test_index {    "settings": {       "number_of_shards": 1,       "analysis": {          "filter": {             "ngram_filter": {                "type": "ngram",                "min_gram": 2,                "max_gram": 20             }          },          "analyzer": {             "ngram_analyzer": {                "type": "custom",                "tokenizer": "standard",                "filter": [                   "lowercase",                   "ngram_filter"                ]             }          }       }    },    "mappings": {       "doc": {          "_all": {             "enabled": true,             "index_analyzer": "ngram_analyzer",             "search_analyzer": "standard"          },          "properties": {             "first_name": {                "type": "string",                "include_in_all": true             },             "last_name": {                "type": "string",                "include_in_all": true             },             "ssn": {                "type": "string",                "index": "not_analyzed",                "include_in_all": false             }          }       }    } }

notice use of the_all field. included first_name , last_name in _all, not ssn, , ssn not analyzed @ since want exact matches against it.

i indexed couple of documents illustration:

post /test_index/doc/_bulk {"index":{"_id":1}} {"first_name":"erik","last_name":"smith","ssn":"111112222"} {"index":{"_id":2}} {"first_name":"bob","last_name":"jones","ssn":"123456789"}

then can query partial names, , filter exact ssn:

post /test_index/doc/_search {    "query": {       "filtered": {          "query": {             "match": {                "_all": {                    "query": "eri smi",                    "operator": "and"                }             }          },          "filter": {             "term": {                "ssn": "111112222"             }          }       }    } }

and i'm expecting:

{    "took": 2,    "timed_out": false,    "_shards": {       "total": 1,       "successful": 1,       "failed": 0    },    "hits": {       "total": 1,       "max_score": 0.8838835,       "hits": [          {             "_index": "test_index",             "_type": "doc",             "_id": "1",             "_score": 0.8838835,             "_source": {                "first_name": "erik",                "last_name": "smith",                "ssn": "111112222"             }          }       ]    } }

if need able search single query string (no filter), include ssn in all field well, setup match on partial strings (like 111112) may not want.

if want match prefixes (i.e., search terms start @ beginning of words), should use edge ngrams.

i wrote blog post using ngrams might out little: http://blog.qbox.io/an-introduction-to-ngrams-in-elasticsearch

here code used answer. tried few different things, including setup posted here, , inluding ssn in _all, edge ngrams. hope helps:

http://sense.qbox.io/gist/b6a31c929945ef96779c72c468303ea3bc87320f

Search This Blog

Shefl

Elasticsearch on multiple fields with partial and full matches -

Comments

Post a Comment

Popular posts from this blog

c++ - No viable overloaded operator for references a map -

java - UML - How would you draw a try catch in a sequence diagram? -

c++ - Rendering a QGraphicsScene to QImage results in objects being placed on a side of QImage -