python - The best way to mark (split?) dataset in each string -


i have dataset containing 485k strings (1.1 gb). each string contains 700 of chars featuring 250 variables (1-16 chars per variable), doesn't have splitmarks. lengths of each variable known. best way modify , mark data symbol ,?


for example: have strings like:

0123456789012... 1234567890123...     

and array of lengths: 5,3,1,4,... should this:

01234,567,8,9012,... 12345,678,9,0123,... 

could me this? python or r-tools preferred me...

pandas load using read_fwf:

in [321]:  t="""0123456789012...""" pd.read_fwf(io.stringio(t), widths=[5,3,1,4], header=none) out[321]:       0    1  2     3 0  1234  567  8  9012 

this give dataframe allowing access each individual column whatever purpose require


Comments

Popular posts from this blog

c++ - No viable overloaded operator for references a map -

java - Custom OutputStreamAppender not run: LOGBACK: No context given for <MYAPPENDER> -

java - Cannot secure connection using TLS -