python - Get word frequency in Elasticsearch with large document -

- February 15, 2015

i have been trying word frequency in elasticsearch. using elasticsearch python , elasticsearch dsl python client.

here code:

client = elasticsearch(["my_ip_machine:port"]) s = search(using=client, index=settings.es_index) \                 .filter("term",content=keyword)\                 .filter("term",provider=json_input["media"])\                 .filter("range",**{'publish': {"from": begin,"to": end}}) s.aggs.bucket("group_by_state","terms",field="content") result = s.execute()

i run code , output this: (i modified output more concise)

{ "word1": 8, "word2": 8, "word3": 6, "word4": 4, }

the code run without problem in elasticsearch 2000 document in laptop. but, got problem when run code in droplet in do. have >2.000.000 document in elasticsearch , use droplet 1 gb ram. every time run code, memory usage increase , elasticsearch shutting down.

there way (more efficient) word frequency in elasticsearch large document? answer in elasticsearch query not problem, convert dsl.

thank :)

when ran problem had go here answer:

elasticsearch query return records

you need grab documents in chunks. lets say, 2000 @ time. then, loop on , make multiple queries.

Search This Blog

Shefl

python - Get word frequency in Elasticsearch with large document -

Comments

Post a Comment

Popular posts from this blog

java - Custom OutputStreamAppender not run: LOGBACK: No context given for <MYAPPENDER> -

java - UML - How would you draw a try catch in a sequence diagram? -

c++ - No viable overloaded operator for references a map -