Spark - Naive Bayes classifier value error -


i have following issue when training naive bayes classifier. i'm getting error:

  file "/home/juande/desktop/spark-1.3.0-bin-hadoop2.4/python/pyspark/mllib  /classification.py", line 372, in train return naivebayesmodel(labels.toarray(), pi.toarray(), numpy.array(theta)) valueerror: invalid __array_struct__ 

when training model using line

dataframe = dataframe.map(lambda x: labeledpoint(sections_to_number[x[4]], tf.transform([x[0], x[1], x[2], x[3]]))) model = naivebayes.train(dataframe, 1.0) 

where sections_to_number dictionary maps value strings float numbers, example sports -> 0, weather -> 1 , on.

however, if train using number instead of using mapping sections_to_number, not error.

dataframe = dataframe.map(lambda x: labeledpoint(10.0, tf.transform([x[0], x[1], x[2], x[3]]))) model = naivebayes.train(dataframe, 1.0) 

am missing something? thanks

naivebayes in spark ml package expects dataframe in form of 2 columns label,feature lable column target or class , feature org.apache.spark.ml.linalg.vector. in case of numeric/ continuous dataset feature column created using vector dataset continuous need convert categorical dataset numeric using onehotencoder of other feature extraction techniques shared @ http://spark.apache.org/docs/latest/ml-features.html#stringindexer.

e.g. onehotencoder converts foo - 0 , baar - 1 , forms vector of double, , dataframe lable , feature passed in algorithm


Comments

Popular posts from this blog

java - Custom OutputStreamAppender not run: LOGBACK: No context given for <MYAPPENDER> -

java - UML - How would you draw a try catch in a sequence diagram? -

c++ - No viable overloaded operator for references a map -