python - Numpy - constructing matrix of Jaro (or Levenshtein) distances using numpy.fromfunction -


i doing text analysis right , part of need matrix of jaro distances between of words in specific list (so pairwise distance matrix) one:

       │cheese chores geese  gloves ───────┼─────────────────────────── cheese │    0   0.222  0.177  0.444      chores │0.222       0  0.422  0.333 geese  │0.177   0.422      0  0.300 gloves │0.444   0.333  0.300      0 

so, tried construct using numpy.fromfunction. per documentation , examples passes coordinates function, gets results, constructs matrix of results.

i tried below approach:

from jellyfish import jaro_distance  def distance(i, j):     return 1 - jaro_distance(feature_dict[i], feature_dict[j])  feature_dict = 'cheese chores geese gloves'.split() distance_matrix = np.fromfunction(distance, shape=(len(feature_dict),len(feature_dict))) 

notice: jaro_distance accepts 2 strings , returns float.

and got error:

file "<pyshell#26>", line 4, in distance     return 1 - jaro_distance(feature_dict[i], feature_dict[j]) typeerror: integer arrays 1 element can converted index 

i added print(i), print(j) beginning of function , found instead of real coordinates odd passed:

[[ 0.  0.  0.  0.]  [ 1.  1.  1.  1.]  [ 2.  2.  2.  2.]  [ 3.  3.  3.  3.]] [[ 0.  1.  2.  3.]  [ 0.  1.  2.  3.]  [ 0.  1.  2.  3.]  [ 0.  1.  2.  3.]] 

why? examples on numpy site show 2 integers passed, nothing else.

i tried reproduce example using lambda function, same error:

distance_matrix = np.fromfunction(lambda i, j: 1 - jaro_distance(feature_dict[i], feature_dict[j]), shape=(len(feature_dict),len(feature_dict))) 

any appreciated - assume misunderstood somehow.

as suggested @xnx have investigated question , found out fromfunc not passing coordinates 1 one, passess of indexies @ same time. meaning if shape of array (2,2) numpy not perform f(0,0), f(0,1), f(1,0), f(1,1), rather perform:

f([[0., 0.], [1., 1.]], [[0., 1.], [0., 1.]]) 

but looks specific function vectorized , produce needed results. code achieve needed below:

from jellyfish import jaro_distance import numpy def distance(i, j):     return 1 - jaro_distance(feature_dict[i], feature_dict[j])  feature_dict = 'cheese chores geese gloves'.split()  funcproxy = np.vectorize(distance)  distance_matrix = np.fromfunction(funcproxy, shape=(len(feature_dict),len(feature_dict))) 

and works fine.


Comments

Popular posts from this blog

c++ - No viable overloaded operator for references a map -

java - Custom OutputStreamAppender not run: LOGBACK: No context given for <MYAPPENDER> -

java - Cannot secure connection using TLS -