Hadoop - Merge reducer outputs to a single file using Java -
i have pig script generates output hdfs directory. pig script generates success file in same hdfs directory. output of pig script split multiple parts number of reducers use in script defined via 'set default_parallel n;'
i use java concatenate/merge file parts single file. want ignore success file while concatenating. how can in java?
thanks in advance.
you can use getmerge
through shell command merge multiple file single file.
usage: hdfs dfs -getmerge <srcdir> <destinationdir/file.txt> example: hdfs dfs -getmerge /output/dir/on/hdfs/ /desired/local/output/file.txt
in case don't want use shell command it. can write java program , can use fileutil.copymerge
method merge output file single file. implementation details available in link
if want single output on hdfs through pig need pass through single reducer. need set number of reducer 1 so. need put below line @ start of script.
--assigning 1 reducer in order generate 1 output file. set default_parallel 1;
i hope you.
Comments
Post a Comment