python - The best way to mark (split?) dataset in each string -
i have dataset containing 485k strings (1.1 gb). each string contains 700 of chars featuring 250 variables (1-16 chars per variable), doesn't have splitmarks. lengths of each variable known. best way modify , mark data symbol ,
?
for example: have strings like:
0123456789012... 1234567890123...
and array of lengths: 5,3,1,4,...
should this:
01234,567,8,9012,... 12345,678,9,0123,...
could me this? python or r-tools preferred me...
in [321]: t="""0123456789012...""" pd.read_fwf(io.stringio(t), widths=[5,3,1,4], header=none) out[321]: 0 1 2 3 0 1234 567 8 9012
this give dataframe allowing access each individual column whatever purpose require
Comments
Post a Comment