Is the a way to force Spark Aggregate / Reduce to "bubble-up"? -

- August 15, 2013

i tried both aggregate , reduce in spark produce large datasets. noticed part of reduction executed in driver. according mllib blog, have managed implement bubbling, ie. once workers have reduced each task/partition move reduction phase subset of workers until delegated driver.

in use case, have 580 partitions don't have many entries in common, ie. each partition size 2gb partitions aggregated 2gbs. driver delegating reduction of partitions driver oome. have missed api call can or best way force behaviours applying incremental repartitioning ?

tnx

i think looking rdd.treeaggregate applies reducer in multi-leveled way, reducing amount of data passed driver final reduction.

it has been moved mllib spark core on spark 1.3.0. see spark-5430

Search This Blog

Shefl

Is the a way to force Spark Aggregate / Reduce to "bubble-up"? -

Comments

Post a Comment

Popular posts from this blog

c++ - No viable overloaded operator for references a map -

java - UML - How would you draw a try catch in a sequence diagram? -

c++ - Gamma correction doesn't look properly corrected, is this linear? -