Running Hadoop jar using Luigi python -
i need run hadoop jar job using luigi python. searched , found examples of writing mapper , reducer in luigi nothing directly run hadoop jar.
i need run hadoop jar compiled directly. how can it?
you need use luigi.contrib.hadoop_jar
package (code).
in particular, need extend hadoopjarjobtask
. example, that:
from luigi.contrib.hadoop_jar import hadoopjarjobtask luigi.contrib.hdfs.target import hdfstarget class textextractortask(hadoopjarjobtask): def output(self): return hdfstarget('data/processed/') def jar(self): return 'jobfile.jar' def main(self): return 'com.ololo.hadoopjob' def args(self): return ['--param1', '1', '--param2', '2']
you can include building jar file maven workflow:
import luigi luigi.contrib.hadoop_jar import hadoopjarjobtask luigi.contrib.hdfs.target import hdfstarget luigi.file import localtarget import subprocess import os class buildjobtask(luigi.task): def output(self): return localtarget('target/jobfile.jar') def run(self): subprocess.call(['mvn', 'clean', 'package', '-dskiptests']) class yourhadooptask(hadoopjarjobtask): def output(self): return hdfstarget('data/processed/') def jar(self): return self.input().fn def main(self): return 'com.ololo.hadoopjob' def args(self): return ['--param1', '1', '--param2', '2'] def requires(self): return buildjobtask()
Comments
Post a Comment