python - Numpy - constructing matrix of Jaro (or Levenshtein) distances using numpy.fromfunction -
i doing text analysis right , part of need matrix of jaro distances between of words in specific list (so pairwise distance matrix) one:
│cheese chores geese gloves ───────┼─────────────────────────── cheese │ 0 0.222 0.177 0.444 chores │0.222 0 0.422 0.333 geese │0.177 0.422 0 0.300 gloves │0.444 0.333 0.300 0
so, tried construct using numpy.fromfunction
. per documentation , examples passes coordinates function, gets results, constructs matrix of results.
i tried below approach:
from jellyfish import jaro_distance def distance(i, j): return 1 - jaro_distance(feature_dict[i], feature_dict[j]) feature_dict = 'cheese chores geese gloves'.split() distance_matrix = np.fromfunction(distance, shape=(len(feature_dict),len(feature_dict)))
notice: jaro_distance accepts 2 strings , returns float.
and got error:
file "<pyshell#26>", line 4, in distance return 1 - jaro_distance(feature_dict[i], feature_dict[j]) typeerror: integer arrays 1 element can converted index
i added print(i)
, print(j)
beginning of function , found instead of real coordinates odd passed:
[[ 0. 0. 0. 0.] [ 1. 1. 1. 1.] [ 2. 2. 2. 2.] [ 3. 3. 3. 3.]] [[ 0. 1. 2. 3.] [ 0. 1. 2. 3.] [ 0. 1. 2. 3.] [ 0. 1. 2. 3.]]
why? examples on numpy site show 2 integers passed, nothing else.
i tried reproduce example using lambda
function, same error:
distance_matrix = np.fromfunction(lambda i, j: 1 - jaro_distance(feature_dict[i], feature_dict[j]), shape=(len(feature_dict),len(feature_dict)))
any appreciated - assume misunderstood somehow.
as suggested @xnx have investigated question , found out fromfunc not passing coordinates 1 one, passess of indexies @ same time. meaning if shape of array (2,2) numpy not perform f(0,0), f(0,1), f(1,0), f(1,1)
, rather perform:
f([[0., 0.], [1., 1.]], [[0., 1.], [0., 1.]])
but looks specific function vectorized , produce needed results. code achieve needed below:
from jellyfish import jaro_distance import numpy def distance(i, j): return 1 - jaro_distance(feature_dict[i], feature_dict[j]) feature_dict = 'cheese chores geese gloves'.split() funcproxy = np.vectorize(distance) distance_matrix = np.fromfunction(funcproxy, shape=(len(feature_dict),len(feature_dict)))
and works fine.
Comments
Post a Comment