python - The best way to mark (split?) dataset in each string -

- June 15, 2012

i have dataset containing 485k strings (1.1 gb). each string contains 700 of chars featuring 250 variables (1-16 chars per variable), doesn't have splitmarks. lengths of each variable known. best way modify , mark data symbol ,?

for example: have strings like:

0123456789012... 1234567890123...

and array of lengths: 5,3,1,4,... should this:

01234,567,8,9012,... 12345,678,9,0123,...

could me this? python or r-tools preferred me...

pandas load using read_fwf:

in [321]:  t="""0123456789012...""" pd.read_fwf(io.stringio(t), widths=[5,3,1,4], header=none) out[321]:       0    1  2     3 0  1234  567  8  9012

this give dataframe allowing access each individual column whatever purpose require

Search This Blog

Shefl

python - The best way to mark (split?) dataset in each string -

Comments

Post a Comment

Popular posts from this blog

c++ - No viable overloaded operator for references a map -

java - UML - How would you draw a try catch in a sequence diagram? -

c++ - Rendering a QGraphicsScene to QImage results in objects being placed on a side of QImage -