Hadoop - Merge reducer outputs to a single file using Java -


i have pig script generates output hdfs directory. pig script generates success file in same hdfs directory. output of pig script split multiple parts number of reducers use in script defined via 'set default_parallel n;'

i use java concatenate/merge file parts single file. want ignore success file while concatenating. how can in java?

thanks in advance.

you can use getmerge through shell command merge multiple file single file.

usage: hdfs dfs -getmerge <srcdir> <destinationdir/file.txt>  example: hdfs dfs -getmerge /output/dir/on/hdfs/ /desired/local/output/file.txt 

in case don't want use shell command it. can write java program , can use fileutil.copymerge method merge output file single file. implementation details available in link

if want single output on hdfs through pig need pass through single reducer. need set number of reducer 1 so. need put below line @ start of script.

--assigning 1 reducer in order generate 1 output file. set default_parallel 1; 

i hope you.


Comments

Popular posts from this blog

c++ - No viable overloaded operator for references a map -

java - Custom OutputStreamAppender not run: LOGBACK: No context given for <MYAPPENDER> -

java - Cannot secure connection using TLS -