python - Pandas Read CSV with string delimiters via regex -


i trying import weirdly formatted text file pandas dataframe. 2 example lines below:

loaded lane       1   mat. type=    2    leffect=    1    span=  200.    space=   10.    beta=   3.474 loadeffect 5075.    lmax= 3643.    cov=  .13 loaded lane       1   mat. type=    3    leffect=    1    span=  200.    space=   10.    beta=   3.515 loadeffect10009.    lmax= 9732.    cov=  .08 

first tried following:

df = pd.read_csv('beta.txt', header=none, delim_whitespace=true, usecols=[2,5,7,9,11,13,15,17,19]) 

this seemed work fine, got messed when hit above example line, there no whitespace after loadeffect string (you may need scroll bit right see in example). got result like:

632   1   2   1  200  10  3.474  5075.  3643.  0.13 633   1   3   1  200  10  3.515  lmax=   cov=   nan 

then decided use regular expression define delimiters. after many trial , error runs (i no expert in regex), managed close following line:

df = pd.read_csv('beta.txt', header=none, sep='/s +|loaded lane|mat. type=|leffect=|span=|space=|beta=|loadeffect|lmax=|cov=', engine='python') 

this works, creates nan column reason @ beginning:

632 nan  1  2  1  200  10  3.474   5075  3643  0.13 633 nan  1  3  1  200  10  3.515  10009  9732  0.08 

at point think can delete first column, , away it. wonder correct way set regex correctly parse text file in 1 shot. ideas? other that, sure there smarter way parse text file. glad hear recommendations.

thanks!

import re import pandas pd import csv csvfile = open("parsing.txt") #open text file reader = csv.reader(csvfile) new_list=[] line in reader:     in line:         new_list.append(re.findall(r'(\d*\.\d+|\d+)', i))  table = pd.dataframe(new_list) table # output pandas dataframe values 

Comments

Popular posts from this blog

c++ - No viable overloaded operator for references a map -

java - Custom OutputStreamAppender not run: LOGBACK: No context given for <MYAPPENDER> -

java - Cannot secure connection using TLS -