Running Hadoop jar using Luigi python -


i need run hadoop jar job using luigi python. searched , found examples of writing mapper , reducer in luigi nothing directly run hadoop jar.

i need run hadoop jar compiled directly. how can it?

you need use luigi.contrib.hadoop_jar package (code).

in particular, need extend hadoopjarjobtask. example, that:

from luigi.contrib.hadoop_jar import hadoopjarjobtask luigi.contrib.hdfs.target import hdfstarget  class textextractortask(hadoopjarjobtask):     def output(self):         return hdfstarget('data/processed/')      def jar(self):         return 'jobfile.jar'      def main(self):         return 'com.ololo.hadoopjob'      def args(self):         return ['--param1', '1', '--param2', '2'] 

you can include building jar file maven workflow:

import luigi luigi.contrib.hadoop_jar import hadoopjarjobtask luigi.contrib.hdfs.target import hdfstarget luigi.file import localtarget  import subprocess import os  class buildjobtask(luigi.task):     def output(self):         return localtarget('target/jobfile.jar')      def run(self):         subprocess.call(['mvn', 'clean', 'package', '-dskiptests'])  class yourhadooptask(hadoopjarjobtask):     def output(self):         return hdfstarget('data/processed/')      def jar(self):         return self.input().fn      def main(self):         return 'com.ololo.hadoopjob'      def args(self):         return ['--param1', '1', '--param2', '2']      def requires(self):         return buildjobtask() 

Comments

Popular posts from this blog

c++ - No viable overloaded operator for references a map -

java - Custom OutputStreamAppender not run: LOGBACK: No context given for <MYAPPENDER> -

java - Cannot secure connection using TLS -