scala - reduceByKey type mismatch -
i got list
res22: array[(string, list[(string, int)])] = array((door_182,list((in,1), (in,1))), (door_89,list((in,1), (in,1), (in,1))), (door_180,list((in,1), (in,1), (in,1), (in,1))), (door_83,list((in,1), (in,1), (in,1))), (door_177,list((in,1), (in,1))), (door_23,list((in,1), (in,1))), (door_128,list((in,1), (in,1))), (door_34,list((in,1), (in,1))), (door_18,list((in,1), (in,1))), (door_32,list((in,1))), (door_76,list((in,1), (in,1), (in,1))), (door_87,list((in,1), (in,1), (in,1))), (door_197,list((in,1), (in,1))), (door_133,list((in,1), (in,1))), (door_119,list((in,1), (in,1))), (door_113,list((in,1), (in,1), (in,1), (in,1), (in,1))), (door_155,list((in,1), (in,1), (in,1), (in,1), (in,1))), (door_168,list((in,1), (in,1), (in,1))), (door_115,list((in,1), (in,1))), (door_9,list((in,1), (in,1))),...
i tried sum number of in each door this:
scala> reduced.map(n => (n._1, n._2)).reducebykey((v1,v2) => v1 + v2.tostring).collect
i error:
<console>:32: error: type mismatch; found : list[(string, int)] required: string reduced.map(n => (n._1, n._2)).reducebykey((v1,v2) => v1 + v2).collect ^
how can solve this?
you can in 2 steps: each key aggregate lists , sum values in each list:
val x = sc.parallelize(list(("door_182",list(("in",1), ("in",1))), ("door_89",list(("in",1), ("in",1), ("in",1))), ("door_180",list(("in",1), ("in",1), ("in",1), ("in",1))), ("door_83",list(("in",1), ("in",1), ("in",1))), ("door_177",list(("in",1), ("in",1))))) x.reducebykey(_ ::: _) .map { case (door, list) => (door, list.foldleft(0){ case (count1, (in2, count2)) => count1 + count2 }) }.collect() res3: array[(string, int)] = array((door_180,4), (door_83,3), (door_177,2), (door_182,2), (door_89,3))
or in single operation aggregatebykey avoiding memory allocation:
x.aggregatebykey(0)( { case (count, list) => count + list.foldleft(0){ case (count1, (in2, count2)) => count1 + count2} }, _ +_ ) .collect()
Comments
Post a Comment