python - Get word frequency in Elasticsearch with large document -
i have been trying word frequency in elasticsearch. using elasticsearch python , elasticsearch dsl python client.
here code:
client = elasticsearch(["my_ip_machine:port"]) s = search(using=client, index=settings.es_index) \ .filter("term",content=keyword)\ .filter("term",provider=json_input["media"])\ .filter("range",**{'publish': {"from": begin,"to": end}}) s.aggs.bucket("group_by_state","terms",field="content") result = s.execute()
i run code , output this: (i modified output more concise)
{ "word1": 8, "word2": 8, "word3": 6, "word4": 4, }
the code run without problem in elasticsearch 2000 document in laptop. but, got problem when run code in droplet in do. have >2.000.000 document in elasticsearch , use droplet 1 gb ram. every time run code, memory usage increase , elasticsearch shutting down.
there way (more efficient) word frequency in elasticsearch large document? answer in elasticsearch query not problem, convert dsl.
thank :)
when ran problem had go here answer:
elasticsearch query return records
you need grab documents in chunks. lets say, 2000 @ time. then, loop on , make multiple queries.
Comments
Post a Comment