Date parse error in Python pandas while reading file -
follow on question to: python pandas reading in file date
i not able parse date on dataframe below. code follows:
df = pandas.read_csv(file_name, skiprows = 2, index_col='datetime', parse_dates={'datetime': [0,1,2]}, delim_whitespace=true, date_parser=lambda x: pandas.datetime.strptime(x, '%y %m %d'))
oth-000.opc xkn1= 0.500000e-01 y m d prcp vwc1 2006 1 1 0.0 0.17608e+00 2006 1 2 6.0 0.21377e+00 2006 1 3 0.1 0.22291e+00 2006 1 4 3.0 0.23460e+00 2006 1 5 6.7 0.26076e+00
i error saying: lambda () takes 1 argument (3 given)
based on @edchum's comment below, if use code:
df = pandas.read_csv(file_name, skiprows = 2, index_col='datetime', parse_dates={'datetime': [0,1,2]}, delim_whitespace=true))
df.index results in object , not datetime series
df.index index([u'2006 1 1',u'2006 1 2'....,u'nan nan nan'],dtype='object')
finally file available here:
ok see problem, file had extraneous blank lines @ end, unfortunately messes parser it's looking whitespace, caused df following:
out[25]: prcp vwc1 datetime 2006 1 1 0.0 0.17608 2006 1 2 6.0 0.21377 2006 1 3 0.1 0.22291 2006 1 4 3.0 0.23460 2006 1 5 6.7 0.26076 nan nan nan nan nan
when remove blank lines imports , parses dates fine:
out[26]: prcp vwc1 datetime 2006-01-01 0.0 0.17608 2006-01-02 6.0 0.21377 2006-01-03 0.1 0.22291 2006-01-04 3.0 0.23460 2006-01-05 6.7 0.26076
and index datetimeindex desired:
in [27]: df.index out[27]: <class 'pandas.tseries.index.datetimeindex'> [2006-01-01, ..., 2006-01-05] length: 5, freq: none, timezone: none
Comments
Post a Comment