python - Pandas Read CSV with string delimiters via regex -
i trying import weirdly formatted text file pandas dataframe. 2 example lines below:
loaded lane 1 mat. type= 2 leffect= 1 span= 200. space= 10. beta= 3.474 loadeffect 5075. lmax= 3643. cov= .13 loaded lane 1 mat. type= 3 leffect= 1 span= 200. space= 10. beta= 3.515 loadeffect10009. lmax= 9732. cov= .08
first tried following:
df = pd.read_csv('beta.txt', header=none, delim_whitespace=true, usecols=[2,5,7,9,11,13,15,17,19])
this seemed work fine, got messed when hit above example line, there no whitespace after loadeffect
string (you may need scroll bit right see in example). got result like:
632 1 2 1 200 10 3.474 5075. 3643. 0.13 633 1 3 1 200 10 3.515 lmax= cov= nan
then decided use regular expression define delimiters. after many trial , error runs (i no expert in regex), managed close following line:
df = pd.read_csv('beta.txt', header=none, sep='/s +|loaded lane|mat. type=|leffect=|span=|space=|beta=|loadeffect|lmax=|cov=', engine='python')
this works, creates nan
column reason @ beginning:
632 nan 1 2 1 200 10 3.474 5075 3643 0.13 633 nan 1 3 1 200 10 3.515 10009 9732 0.08
at point think can delete first column, , away it. wonder correct way set regex correctly parse text file in 1 shot. ideas? other that, sure there smarter way parse text file. glad hear recommendations.
thanks!
import re import pandas pd import csv csvfile = open("parsing.txt") #open text file reader = csv.reader(csvfile) new_list=[] line in reader: in line: new_list.append(re.findall(r'(\d*\.\d+|\d+)', i)) table = pd.dataframe(new_list) table # output pandas dataframe values
Comments
Post a Comment