Python pandas for reading in file with date -
in dataframe below, 3rd line header , y, m , d columns giving year month , day respectively. however, not able read them in using code:
df = pandas.read_csv(file_name, skiprows = 2, index_col='datetime', parse_dates={'datetime': [0,1,2]}, date_parser=lambda x: pandas.datetime.strptime(x, '%y %m %d'))
oth-000.opc xkn1= 0.500000e-01 y m d prcp vwc1 2006 1 1 0.0 0.17608e+00 2006 1 2 6.0 0.21377e+00 2006 1 3 0.1 0.22291e+00 2006 1 4 3.0 0.23460e+00 2006 1 5 6.7 0.26076e+00
i keyerror: list index out of range. suggestions?
the default separator in read_csv
comma. file doesn't use commas separators, you're getting 1 big column:
>>> pd.read_csv(file_name, skiprows = 2) y m d prcp vwc1 0 2006 1 1 0.0 0.17608e+00 1 2006 1 2 6.0 0.21377e+00 2 2006 1 3 0.1 0.22291e+00 3 2006 1 4 3.0 0.23460e+00 4 2006 1 5 6.7 0.26076e+00 >>> pd.read_csv(file_name, skiprows = 2).columns index([u' y m d prcp vwc1 '], dtype='object')
you should able use delim_whitespace=true
:
>>> df = pd.read_csv(file_name, skiprows = 2, delim_whitespace=true, parse_dates={"datetime": [0,1,2]}, index_col="datetime") >>> df prcp vwc1 datetime 2006-01-01 0.0 0.17608 2006-01-02 6.0 0.21377 2006-01-03 0.1 0.22291 2006-01-04 3.0 0.23460 2006-01-05 6.7 0.26076 >>> df.index <class 'pandas.tseries.index.datetimeindex'> [2006-01-01, ..., 2006-01-05] length: 5, freq: none, timezone: none
(i didn't specify date_parser
, because i'm lazy , read correctly default, it's not bad habit explicit.)
Comments
Post a Comment