Is the a way to force Spark Aggregate / Reduce to "bubble-up"? -


i tried both aggregate , reduce in spark produce large datasets. noticed part of reduction executed in driver. according mllib blog, have managed implement bubbling, ie. once workers have reduced each task/partition move reduction phase subset of workers until delegated driver.

in use case, have 580 partitions don't have many entries in common, ie. each partition size 2gb partitions aggregated 2gbs. driver delegating reduction of partitions driver oome. have missed api call can or best way force behaviours applying incremental repartitioning ?

tnx

i think looking rdd.treeaggregate applies reducer in multi-leveled way, reducing amount of data passed driver final reduction.

it has been moved mllib spark core on spark 1.3.0. see spark-5430


Comments

Popular posts from this blog

java - Custom OutputStreamAppender not run: LOGBACK: No context given for <MYAPPENDER> -

java - UML - How would you draw a try catch in a sequence diagram? -

c++ - No viable overloaded operator for references a map -